Mining Vehicles Frequently Appearing Together from Massive Passing Records

Similar documents
Concurrent Apriori Data Mining Algorithms

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio,

Cluster Analysis of Electrical Behavior

An Optimal Algorithm for Prufer Codes *

Load Balancing for Hex-Cell Interconnection Network

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Positive Semi-definite Programming Localization in Wireless Sensor Networks

A Binarization Algorithm specialized on Document Images and Photos

Parallel matrix-vector multiplication

Available online at Available online at Advanced in Control Engineering and Information Science

Association Rule Mining with Parallel Frequent Pattern Growth Algorithm on Hadoop

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Load-Balanced Anycast Routing

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Video Proxy System for a Large-scale VOD System (DINA)

Solving two-person zero-sum game by Matlab

Algorithms for Frequent Pattern Mining of Big Data

Private Information Retrieval (PIR)

Related-Mode Attacks on CTR Encryption Mode

X- Chart Using ANOM Approach

Constructing Minimum Connected Dominating Set: Algorithmic approach

Parallel Implementation of Classification Algorithms Based on Cloud Computing Environment

Wishing you all a Total Quality New Year!

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Improved Resource Allocation Algorithms for Practical Image Encoding in a Ubiquitous Computing Environment

ApproxMGMSP: A Scalable Method of Mining Approximate Multidimensional Sequential Patterns on Distributed System

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Study of Data Stream Clustering Based on Bio-inspired Model

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A high precision collaborative vision measurement of gear chamfering profile

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

A METHOD FOR FACTOR SCREENING OF SIMULATION EXPERIMENTS BASED ON ASSOCIATION RULE MINING

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Simulation Based Analysis of FAST TCP using OMNET++

Support Vector Machines

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Improved Image Segmentation Algorithm Based on the Otsu Method

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

An Image Fusion Approach Based on Segmentation Region

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce

Virtual Machine Migration based on Trust Measurement of Computer Node

Delay Variation Optimized Traffic Allocation Based on Network Calculus for Multi-path Routing in Wireless Mesh Networks

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

The Codesign Challenge

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

A Frame Packing Mechanism Using PDO Communication Service within CANopen

Research of Dynamic Access to Cloud Database Based on Improved Pheromone Algorithm

S1 Note. Basis functions.

Analysis on the Workspace of Six-degrees-of-freedom Industrial Robot Based on AutoCAD

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Module Management Tool in Software Development Organizations

A Combined Approach for Mining Fuzzy Frequent Itemset

Remote Sensing Image Retrieval Algorithm based on MapReduce and Characteristic Information

Assembler. Building a Modern Computer From First Principles.

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

TF 2 P-growth: An Efficient Algorithm for Mining Frequent Patterns without any Thresholds

DAQ-Middleware: Data Acquisition Middleware based on Internet of Things

A Deflected Grid-based Algorithm for Clustering Analysis

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems

Two-Stage Data Distribution for Distributed Surveillance Video Processing with Hybrid Storage Architecture

Spatial Data Dynamic Balancing Distribution Method Based on the Minimum Spatial Proximity for Parallel Spatial Database

Classifier Swarms for Human Detection in Infrared Imagery

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

USING GRAPHING SKILLS

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

Wireless Sensor Networks Fault Identification Using Data Association

Fast Computation of Shortest Path for Visiting Segments in the Plane

Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems

Problem Set 3 Solutions

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

The Shortest Path of Touring Lines given in the Plane

Application of VCG in Replica Placement Strategy of Cloud Storage

Efficient Distributed File System (EDFS)

Query Clustering Using a Hybrid Query Similarity Measure

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

AADL : about scheduling analysis

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Wavefront Reconstructor

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Mining User Similarity Using Spatial-temporal Intersection

CMPS 10 Introduction to Computer Science Lecture Notes

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Security Vulnerabilities of an Enhanced Remote User Authentication Scheme

A Clustering Algorithm for Key Frame Extraction Based on Density Peak

An Entropy-Based Approach to Integrated Information Needs Assessment

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

SAO: A Stream Index for Answering Linear Optimization Queries

Transcription:

Appl. Math. Inf. Sc. 9, No. 3, 1427-1433 (2015) 1427 Appled Mathematcs & Informaton Scences An Internatonal Journal http://dx.do.org/10.12785/ams/090337 Mnng Vehcles Frequently Appearng Together from Massve Passng Records Dongjn Yu 1,, Wensheng Dou 1, Wanqng L 1, Suhang Zheng 1 and Janhua Shao 2,3 1 Hangzhou Danz Unversty, Hangzhou, Chna 2 Zhejang Topcheer Informaton Technology Co., Ltd, Hangzhou, Chna 3 Zhejang Provncal Key Laboratory of Network Technology and Informaton Securty, Chna Receved: 7 Aug. 2014, Revsed: 8 Nov. 2014, Accepted: 9 Nov. 2014 Publshed onlne: 1 May 2015 Abstract: Vehcles Frequently Appearng Together, or VFATs, can be clues n solvng crmnal cases. Tradtonal sequence mnng approaches help dentfy VFATs from passng-through records collected at montorng stes. However, huge traffc data streams hnder fast dentfcaton of VFATs. In ths paper, we present a mult-threaded approach to fast dentfcaton of VFATs based on mult-core processors, called Frequent Sequental Mnng based on Mult-Cores (FSMMC). It parallels the executon of tasks, parttons large volumes of data, and obtans VFATs by mergng local canddates dscovered n dfferent threads runnng on dfferent processor cores. Through local parallel reducton, FSMMC elmnates the repettve patterns and reduces computatonal effort. Moreover, t acheves workload balance by the dynamc dstrbuton of tasks to a pool of threads where the thread that fnshes frst jons another runnng thread. Both theoretcal analyss and case studes show that FSMMC takes full advantage of mult-core computng platforms and has hgher speed-up when searchng VFATs among massve passng through records, compared wth other approaches wthout multthreadng. Keywords: massve data mnng, parallel, sequental patterns, mult-core, Vehcles Frequently Appearng Together 1 Introducton When solvng crmnal cases, Vehcles Frequently Appearng Together, or VFATs, can sometmes be valuable clues. Collectng records on vehcles passng through from dfferent montorng stes and then searchng for vehcles frequently appearng together has been proven to be an effectve manner to fnd VFATs. However, such nvestgaton always nvolves large traffc streams and therefore takes a long tme. Moreover, VFATs have hgh moblty and can usually escape notce. How to quckly dentfy VFATs from massve traffc data streams therefore becomes a key ssue. In recent years, varous methods of data mnng have matured and been appled wdely n varous felds, ncludng the dscovery of motfs n DNA sequences, the analyss of web log and customer shoppng sequences, and study of XML query access patterns [1]. Frequent pattern dscovery or sequental mnng, whch was poneered by the works of Agrawal et al. n the Apror algorthm [2], could be used to fnd VFATs. The problem wth frequent patterns, gven a mnmum support threshold mn sup, s n dscoverng all the tem sets that occur at least mn suptmes n the database. Here, vehcles frequently appearng together can be regarded as frequent patterns,.e., they often appear somewhere as a whole. Hgh-performance computaton utltes, such as mult-core and many-core servers, offer deal mnng platforms. The problem n fndng VFATs s therefore how to fully explot the parallelsm, or harness the power of these mult-core processors. A number of works have focused on parallel formulatons for fndng frequent patterns on shared-memory computers and GPU nodes [3, 4]. Indeed, many parallel computng models already exst. For example, OpenMP s a well-known parallel framework supportng mult-platform shared-memory parallel programmng n C/C++. Although OpenMP s smple to use because of ts automatc data layout and decomposton by drectves, t lacks relable error handlng and fne-graned mechansms to control thread-processor mappng. In ths paper, we present a novel parallel frequent sequental mnng approach Correspondng author e-mal: yudj@hdu.edu.cn Natural Scences Publshng Cor.

1428 DJ. Yu et. al. : Mnng Vehcles Frequently Appearng Together... employng mult-cores called FSMMC (Frequent Sequental Mnng based on Mult-Cores) to search for VFATs. FSMMC takes threads as the parallel unt and can mnmze memory bandwdth and maxmze cache reuse. Both theoretcal analyss and case studes ndcate that t s an effcent mult-core mplementaton. The structure of the paper s as follows. In secton 2, we brefly defne the problem of extractng frequent patterns wth mult-core processors, descrbe the approach n depth, and provde the necessary theoretcal background. In secton 3, we theoretcally evaluate the performance of the approach; we thus prove that the method s precse, wth lower calculaton complexty and more feasblty. Secton 4 presents the detaled expermental results, comparng the approach wth dfferent numbers of runnng threads on a mult-core processor. In secton 5 we then revew the current state of parallel pattern mnng technology. Fnally, secton 6 concludes the paper and gves drectons for future work. 2 The FSMMC Approach 2.1 Defntons Defnton 1 The temset of vehcle passng record n a gven database D, s denoted as a quadruple,.e., I p,t,l,d =< p,t,l,d >, n whch p represents the vehcle plate number, t represents the tme of passng through montorng stes, l represents the locaton of montorng stes, and d represents the vehcle drvng drecton. Defnton 2 A sequence s s an ordered lst of temsets n a perod of tme D t, denoted as S=< I p1,t 1,l 1,d 1,I p2,t 2,l 2,d 2,...,I pm,t m,l m,d m >. Defnton 3 The term motorcade sequence s used to represent the group of vehcles passng through the same montorng stes n the same drecton n the tme of nterval t, denoted as: ; S m = (p 1,..., p,..., p j,..., p n ) (1) < I p1,t 1,l,d,...,I p,t,l,d,...,i p j,t j,l,d,...,i pn,t n,l,d > S t t t j,n {1,...,m} l {l 1,...,l m },d {d 1,...,d m } The length of S m, denoted as S m, s the number of the temsets S m holds. Defnton 4 Gven a database D storng the vehcle passng records, the support of the sequence S m, or the relablty as the suspect VFAT, s denoted as sup(s m ) = S t S l /D t, where S t denotes the count of S m occurrng, and s l denotes the number of passng montorng stes S m covers. Defnton 5 Gven a mnmum support threshold mn sup, f sup(s m ) mn sup, S m s then called a frequent sequence S,.e., VFATs. The collecton of S s denoted as L N when S =N. Defnton 6 Gven a database D storng the vehcle passng records, all S m t holds are called canddate sequences, denoted as C N when S m = N. In other words,l N ={C N sup(c N ) mn sup}. Defnton 7 In the mult-thread envronment, the subsequence of C N acqured on thread s called CN. There exsts n =1 C N = C N, where n s the total number of runnng threads. To dstngush dfferent motorcade sequences from CN,V S m s used to represent the set of C CN N when ts sequental value s S m. Defnton 8 Gven a sequence database D storng the vehcle passng records, let T s be the seral sequence mnng tme wth a sngle-core processor, and let T(q) be the parallel sequence mnng tme wth q-core processors. The speed-up s then defned as S(q)=T s /T(q). 2.2 The FSMMC Approach The FSMMC approach s desgned to be executed on a shared memory system. It parttons the workload nto ndependent tasks, but assumes that the whole dataset s accessble to all threads. In ths way, each thread runs ndependently through lock-free programmng wthout the need for nter-thread communcaton. In order to combne the propertes of mult-core processors, the FSMMC approach can be further dvded nto three phases: 1) the global database s dvded nto several local datasets for each thread by means of the equdstant statc projecton method; 2) local motorcade sequental patterns are located n each thread by local parallel reducton; 3) local motorcade patterns are dynamcally combned nto the frequent sequental patterns. These phases are llustrated n Fgure 1. 1) The complete database D s parttoned to D and assgned to the thread (=1,2,...,n) for loadng. The global database D s dvded nto D 1,D 2,...,D n, and D= n =1 D. If there are R records n D, then the records R for thread are shown as (2). Here, M j represents record j n the local database D and T p represents record p n the global database D. { R = M j M j = T p, p= n R ( 1)+ j, p [ n R ( 1), nr ]} (2) 2) For thread (=1,2,...,n), the local database s scanned once to fnd all motorcade sequental patterns; where necessary, they are then reduced and stored n fles. Because D s always too large to store n memory wholly, FSMMC needs further dvson to get Natural Scences Publshng Cor.

Appl. Math. Inf. Sc. 9, No. 3, 1427-1433 (2015) / www.naturalspublshng.com/journals.asp 1429 as: T = m j=1 Q j,=1,2,...,n (3) On the other hand, through the parallel computaton on mult-cores, the tme for generatng all the motorcade sequental patterns s: ( ) m T 2 = Max(T )=Max j=1 Q j,=1,2,...,n (4) Fg. 1: The process of FSMMC approach. smaller datasets D. Then, t spawns n threads, each scannng D to get canddate sequences. Consderng ths step s one of the most costly steps, we use local parallel reducton to elmnate the repettve patterns. It s very lkely that one certan task has a lower computatonal cost than all the others. Therefore, FSMMC creates the thread pool wthn whch each thread s assgned to one certan task of pattern searchng. Those whch fnsh searchng frst wll jon n wth other threads. In other words, FSMMC allows each thread to process asynchronously, whch can help to gan space and reduce runnng tme effcently. 3) Local motorcade patterns are combned n each storng fle and fnal frequent sequences are derved. After beng processed by each thread, the reducton objects need to be merged. Frst, FSMMC puts the tasks of combnng fles n a global task lst after the fles have been regularly marked, makng sure each task has a number correspondng to ts rank. Then, every thread selects a task from the lst as ther own assgnment and ndependently elmnates nfrequent motorcade tems. Snce all threads are ndependent of each other, only ther calculaton workloads requred to be balanced n order to boost performance. FSMMC repeatedly checks whether there s an dle core. If one exsts, t selects a new task from the global task lst and runs t. All frequent motorcade sequences wll then be fnally dentfed when the task lst becomes empty. 3 Performance Evaluaton In ths secton, we evaluate the performance of FSMMC by checkng ts runnng tme. Suppose the tme we spend on the frst phase,.e., the phase where the global database D s dvded nto D 1,D 2,...,D n, s T 1. The tme that the thread ( = 1,2,...,n) spends on database D to fnd the motorcade sequental pattern S m j s ( j = 1,2,...,m; = 1,2,...,n). We use m to ndcate the number of motorcade sequences on D and n as the total number of threads. The total tme thread spends on D to fnd all the local sequences can then be represented Q j However, the tme for the tradtonal seral computatonal method to fnd the sequental patterns s equvalent to the sum tme of each thread treated separately, as s: T 2 = n =1 T = n =1 m j=1 Q j,=1,2,...,n (5) In the thrd phase (combnng local patterns to obtan all the motorcades frequent sequences), the tme that thread takes s: T = k j=q F j + t,q=1,2,...,k;=1,2,...,n (6) n whch, F j represents the processng tme for fle j, k means the total number of fles for combnng, and t s the system overhead for threads accessng the global task lst, fetchng new assgnments and other system operatons. Remarkably, t k j=q F j. So, the tme FSMMC spends on ths phase by parallel processng on mult-cores s: ( k T 3 Max(T ) Max j=q F j + t ), q=1,2,...,k;=1,2,...,n (7) However, relatve to parallel processng, the tme for tradtonal seral sequental processng approxmates to: T 3 n =1 T = n =1 k j=q (F j + t ), q=1,2,...,k;=1,2,...,n (8) In concluson, the total tme wth FSMMC s: ( ) m T = T 1 + T 2 + T 3 = T 1 + Max j=1 Q j + ( k Max j=q F j + t ),q=1,2,...,k;=1,2,...,n (9) The total runnng tme wth the tradtonal seral approach s: T = T 1 + T 2+ T 3 = T 1 + n =1 m j=1 Q j + n =1 k j=q (F j + t ),q=1,2,...,k;=1,2,...,n (10) Therefore, because Max( m j=1 Q j ) n =1 m j=1 Q j and Max( k j=q F j + t ) n =1 k j=q (F j + t ), FSMMC approach can acheve hgher performance on mult-core processors. Natural Scences Publshng Cor.

1430 DJ. Yu et. al. : Mnng Vehcles Frequently Appearng Together... 4 Case Studes 4.1 Case Envronments The FSMMC approach has been successfully used n fast dentfcaton of VFATs based on massve traffc data streams. In the experment, VFATs are defned as N suspect motorcades, whch pass through the montorng stes wth the support over mn sup. In the testng phase, the attrbutes of vehcle passng records nclude plate number, tme of passng by montorng stes, locaton of montorng stes and vehcle drvng drecton. We ran the test program on an Intel Core 2 processor wth 2.40G Hz and 2GB RAM runnng Wndows XP. The databases used contaned about 3,000,000 records. The FSMMC approach was mplemented wth JDK 1.6. Fg. 2: VFATs found n the case where N = 2(vehcles), δ t = 60(seconds) and mn sup=2.5. 4.2 Case Results We ran the FSMMC approach n dfferent scales of traffc streams by spawnng varyng numbers of threads, where each thread executed the same code for frequent sequence mnng. The approach provded good extensblty by optonally changng the number of threads optonally. Input datasets of the same sze were used and all the results were saved n a fle on hard dsks to be used later. The results generated are shown n Fgure 2. When the mn sup s assgned 2.50, VFATs are the top 15 records. A more detaled analyss of the average runnng tme used to search for VFATs s llustrated n Fgure 3. As shown n the Fgure 3, the more sequences generated, the more calculaton tme for fle reducton s requred. However, as the number of threads ncreases, the ncrease becomes less, especally when the dataset has more than 1,000,000 records. Specfc to a certan mult-core system, the approach can employ resources of exstng mult-core processors through multthread programmng technology, leadng to better results on larger volumes of datasets. In order to verfy the effectveness of the FSMMC approach more ntutvely, we analysed the speed-up of dfferent threads on a four-core and a two-core processor (usng the same datasets wth 2,700,000 records). Fgure 4 shows the average T(2) s about 897.4 seconds n a multthreadng envronment from one thread to fve threads, whereas T(4) s about 261.2 seconds. Thus, the average S(4)/S(2) s approxmately 3.44. Furthermore, as can be seen n Fgure 4, a processor wth more cores can obtan more stable results. Due to the dynamc task dstrbuton mechansms and local parallel reducton, the FSMMC approach reduces dle core tme and the tme requred to combne sequences. It ncorporates runtme performance characterstcs and succeeds n usng mult-core processor collaboraton to optmze the performance of the parallel approach. Ths approach Fg. 3: Runnng tme of searchng for VFATs from dfferent scales of passng vehcles by FSMMC on a four-core processor. could therefore acheve good performance n dentfyng VFATs from massve traffc data streams. Fg. 4: Runnng tme and speed-ups on multple cores wth dfferent thread numbers. Natural Scences Publshng Cor.

Appl. Math. Inf. Sc. 9, No. 3, 1427-1433 (2015) / www.naturalspublshng.com/journals.asp 1431 5 Related Works The effcent analyss of spato-temporal data, generated by movng vehcles, s an essental requrement for ntellgent transportaton servces. To our knowledge, such research currently focuses manly on the methods of effcently extractng long sharable frequent routes [5, 6], or Swarms [7], but not delberately tralng vehcles. In contrast to the rdesharng applcaton, the dentfcaton of VFATs nvolves a huge amount of data and therefore demands more mnng power. Frequent pattern mnng s a core feld n data mnng research. Snce the frst soluton to the problem of frequent tem-set mnng was presented by Agrawal et al. [8], varous specalzed n-memory data structures have been proposed to mprove mnng effcency [9]. It has been recognzed that the set of all frequent tem-sets s too large to be analysed and the nformaton they contan s therefore redundant. To remedy ths, numerous works have studed parallel frequent pattern mnng on clusters to mprove mnng effcency [10, 11]. These works explore a spectrum of trade-offs between computaton, communcaton, memory usage, synchronzaton, and the use of problem-specfc nformaton n parallel data mnng. However, the experments showed synchronzaton costs became qute large f the data dstrbutons were skewed or the nodes were not equally capable. Consderng mult-core systems wth lower nter-processor communcaton costs and lmted off-chp bandwdth, parallel frequent pattern mnng on mult-core processors was poneered by Buehrer et al. [12, 13]. Based on the seral algorthm gspan [14] and the smlar study by Worlen et al. [15], Buehrer et al. proposed a parallel frequent graph mnng algorthm wth excellent scale-up propertes. Ther contrbuton comprses an effcent way to decompose work and to explore the search space n a depth-frst way. They also proposed a way to explot temporal localty of the cache. However, ths method needs excessve memory consumpton due to ts statc embeddng technques. Lucchese et al. proposed smlar strateges for mnng closed frequent tem-sets, whch contan optmzatons for mprovng cache usage when creatng condtonal databases (called projectons n ther paper) [16]. Tatkonda et al. studed the approaches on parallel frequent tree mnng [17]. Ther algorthm could scale up very well wth the number of cores, leadng to a quas-lnear speed-up n a lot of real-world databases. However, t costs too much tme for memory accesses. The past few years have also wtnessed the emergence of several novel approaches other than the mult-core ones for the mplementaton and deployment of large-scale data mnng. MapReduce, whch has been popularzed by Google, s a scalable and fault-tolerant data processng model that enables to process a massve volume of data n parallel wth many low-end computng nodes. We n [18] ntroduce a parallel mplementaton of BIDE algorthm on MapReduce, called BIDE-MR. The experments on an Apache Hadoop cluster show that BIDE-MR attans good parallelzaton. However, the approach presented n ths paper s easer to be mplemented snce t effectvely utlzes the mult-core structure of the sngle node. 6 Conclusons Ths paper presents a novel approach to the fast dentfcaton of VFATs from massve traffc data streams on mult-core processors. To harness the power of the mult-core processors, we use a dynamc task dstrbuton mechansm to balance the workloads of dfferent threads. A thread-steal happens when a task s not comparable wth the cumulatve cost of the other tasks. Both theoretcal analyss and case studes show that the approach takes good advantage of mult-core computng platforms and has hgher performance and speed-up, compared wth other approaches wthout mult-threadng. It s notable that sequental pattern mnng requres teratve scans of the sequence dataset wth numerous data comparsons and analyses. In other words, t s memory ntensve. Therefore, optmzatons of massve storage access are always needed. Other problems, such as how to ncrease the certanty of thread schedulng and how to lmt the search space to further mprove accuracy, stll need to be studed. Acknowledgements The work s supported by Natural Scence Foundaton (No.61472112), Natural Scence Foundaton of Zhejang (No.LY12F02003), the Key Scence and Technology Project of Zhejang (No. 2012C11026-3, No. 2008C11099-1) and the open project of Zhejang Provncal Key Laboratory of Network Technology and Informaton Securty. The authors would also lke to thank anonymous revewers who made valuable suggestons to mprove the qualty of the paper. References [1] Agrawal, R., Srkant, R., Mnng sequental patterns. In: Proc. of ICDE, Tape, Tawan, Mar., 1995. [2] Agrawal, R., Srkant, R., Fast algorthms for mnng assocaton rules. In: Proc. of VLDB, 1994, pp. 487-499. [3] Jn, R., Yang, G., Agrawal, G., Shared memory parallelzaton of data mnng algorthms: Technques, programmng nterface, and performance. IEEE Trans. on Knowl. and Data Eng., vol. 17, no. 1, 2005, pp. 71-89. [4] Fang, W., Lu, M., Xao, X., He, B., Luo, Q., Frequent temset mnng on graphcs processors. In DaMoN 09: Proc. of the 5th Internatonal Workshop on Data Management on New Hardware, New York, NY, USA, ACM, 2009, pp. 34-42. Natural Scences Publshng Cor.

1432 DJ. Yu et. al. : Mnng Vehcles Frequently Appearng Together... [5] Gdfalv, G., Pedersen, T.B., Mnng long, sharable patterns n trajectores of movng objects. GeoInformatca, vol. 13, no. 1, 2009, pp. 27-55. [6] Xue, G., L, Z., Zhu, H., Lu, Y., Traffc-known urban vehcular route predcton based on partal moblty patterns. In: Proc. of the Internatonal Conference on Parallel and Dstrbuted Systems - ICPADS, 2009, pp. 369-375. [7] L, Z., Dng, B., Han, J., Kays, R., Swarm: Mnng Relaxed Temporal Movng Object Clusters. In: Proc. of the VLDB Endowment, vol. 3, no. 1, 2010, pp. 723-734. [8] Agrawal, R., Imlensk, T., Swam, A., Mnng assocaton rules between sets of tems n large databases. In: Proc. of SIGMOD, 1993, pp. 207-216. [9] Goethals, B., Survey on frequent pattern mnng. In: http:// cteseer.st.psu.edu/goethals03survey.html, 2003 [10] Agrawal, R., Shafer, J. C., Parallel mnng of assocaton rules. IEEE Trans. Knowl. Data Eng., vol. 8, no. 6, 1996, pp. 962-969. [11] Zak, M. J., Parthasarathy, S., Oghara, M., L, W., Parallel algorthms for dscovery of assocaton rules. Data Mn. Knowl. Dscov., vol. 1, no. 4, 1997, pp. 343-373. [12] Buehrer, G., Parthasarathy, S., Chen, Y. K., Adaptve parallel graph mnng for CMP archtectures. In: Proc. of ICDM, 2006, pp. 97-106. [13] Buehrer, G., Parthasarathy, S., Km, D., Towards data mnng on emergng archtectures. In: Proc. of 9th SIAM Workshop on Hgh Performance and Dstrbuted Mnng. Bethesda, USA, 2006. [14] Yan, X., Han, J., gspan: Graph-based substructure pattern mnng. In: ICDM, 2002, p. 721. [15] Worlen, M., Menl, T., Fscher, I., Phlppsen, M., A quanttatve comparson of the subgraph mners mofa, gspan, ffsm, and gaston. In: Proc. of the 9th European Conference on Prncples and Practce of Knowledge Dscovery n Databases (PKDD), Porto, Portugal, 2005, pp. 392-403. [16] Lucchese, C., Orlando, S., Perego, R., Parallel mnng of frequent closed patterns: Harnessng modern computer archtectures. In: Proc. of ICDM, 2007, pp. 242-251. [17] Tatkonda, S., Parthasarathy, S., Mnng Tree-Structured Data on Multcore Systems. In: Proc. of VLDB, 2009, pp. 694-705. [18] Yu, D., Wu, W., Zheng, S., Zhu, Z., BIDE-based parallel mnng of frequent closed sequences wth MapReduce, LNCS 7440, 2012, pp.177-186. Dongjn Yu s currently a professor at Hangzhou Danz Unversty and a vstng scholar of Unversty of Calforna, Santa Barbara. He receved hs BS and MS n Computer Applcatons from Zhejang Unversty n Chna, and PhD n Management from Zhejang Gongshang Unversty n Chna. Hs current research efforts nclude ntellgent nformaton processng, program comprehenson and servce computng. He s especally nterested n the novel approaches to constructng large enterprse nformaton systems effectvely and effcently by emergng advanced nformaton technologes. He s the drector of Insttute of Cloud and Bg Data and vce drector of Insttute of Intellgent and Software Technology of Hangzhou Danz Unversty. He s a member of ACM and IEEE, and a senor member of Chna Computer Federaton (CCF). He s also a member of Techncal Commttee of Software Engneerng CCF (TCSE CCF) and a member of Techncal Commttee of Servce Computng CCF (TCSC CCF). Wensheng Dou s currently a postgraduate at Hangzhou Danz Unversty, Chna. He has partcpated n some government-funded projects related wth data management. Hs current research nterests manly nclude data mnng and bg data processng. Wanqng L receved hs PhD degree n mechancs of sold from Lanzhou Unversty n Chna n 2007 and works as an Assocate Professor n Hangzhou Danz Unversty. Hs present nterests are numercal parallel computng and data mnng. Suhang Zheng receved her master degree n computer scence from Hangzhou Danz Unversty n Chna. She has publshed a number of hgh-qualty papers related wth data mnng. She now works for Albaba.com. Natural Scences Publshng Cor.

Appl. Math. Inf. Sc. 9, No. 3, 1427-1433 (2015) / www.naturalspublshng.com/journals.asp 1433 Janhua Shao receved hs bachelor s degree n Mathematcs from Fudan Unversty n Chna. Hs prmary research area ncludes networkng computng and system ntegraton. Natural Scences Publshng Cor.