Improved Resource Allocation Algorithms for Practical Image Encoding in a Ubiquitous Computing Environment

Similar documents
Load Balancing for Hex-Cell Interconnection Network

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Cluster Analysis of Electrical Behavior

An Optimal Algorithm for Prufer Codes *

AADL : about scheduling analysis

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

Virtual Machine Migration based on Trust Measurement of Computer Node

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Efficient Distributed File System (EDFS)

Video Proxy System for a Large-scale VOD System (DINA)

Concurrent Apriori Data Mining Algorithms

Private Information Retrieval (PIR)

Evaluation of Parallel Processing Systems through Queuing Model

Shared Running Buffer Based Proxy Caching of Streaming Sessions

Application of Improved Fish Swarm Algorithm in Cloud Computing Resource Scheduling

SRB: Shared Running Buffers in Proxy to Exploit Memory Locality of Multiple Streaming Media Sessions

The Research of Ellipse Parameter Fitting Algorithm of Ultrasonic Imaging Logging in the Casing Hole

Simulation Based Analysis of FAST TCP using OMNET++

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Binarization Algorithm specialized on Document Images and Photos

A Distributed Dynamic Bandwidth Allocation Algorithm in EPON

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

The Codesign Challenge

Related-Mode Attacks on CTR Encryption Mode

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Constructing Minimum Connected Dominating Set: Algorithmic approach

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

An Improved Image Segmentation Algorithm Based on the Otsu Method

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

A Novel Distributed Collaborative Filtering Algorithm and Its Implementation on P2P Overlay Network*

Analysis on the Workspace of Six-degrees-of-freedom Industrial Robot Based on AutoCAD

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations*

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Suppression for Luminance Difference of Stereo Image-Pair Based on Improved Histogram Equalization

Agile Data Streaming for Grid Applications

An Efficient Algorithm for PC Purchase Decision System

High-Boost Mesh Filtering for 3-D Shape Enhancement

ARTICLE IN PRESS. Signal Processing: Image Communication

Dynamic Bandwidth Allocation Schemes in Hybrid TDM/WDM Passive Optical Networks

Security Vulnerabilities of an Enhanced Remote User Authentication Scheme

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Module Management Tool in Software Development Organizations

The Shortest Path of Touring Lines given in the Plane

Meta-heuristics for Multidimensional Knapsack Problems

Wireless Sensor Network Localization Research

Mathematics 256 a course in differential equations for engineering students

A high precision collaborative vision measurement of gear chamfering profile

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Fast Computation of Shortest Path for Visiting Segments in the Plane

Assembler. Building a Modern Computer From First Principles.

A MapReduce-supported Data Center Networking Topology

Design of Structure Optimization with APDL

DEAR: A DEVICE AND ENERGY AWARE ROUTING PROTOCOL FOR MOBILE AD HOC NETWORKS

Remote Sensing Image Retrieval Algorithm based on MapReduce and Characteristic Information

A Parallelization Design of JavaScript Execution Engine

Real-time Motion Capture System Using One Video Camera Based on Color and Edge Distribution

CMPS 10 Introduction to Computer Science Lecture Notes

FAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks

Load-Balanced Anycast Routing

Analysis of Collaborative Distributed Admission Control in x Networks

Resource and Virtual Function Status Monitoring in Network Function Virtualization Environment

Problem Set 3 Solutions

Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems

Wightman. Mobility. Quick Reference Guide THIS SPACE INTENTIONALLY LEFT BLANK

Linear Hashtable Motion Estimation Algorithm for Distributed Video Processing

Network Coding as a Dynamical System

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

Research of Dynamic Access to Cloud Database Based on Improved Pheromone Algorithm

Study of Data Stream Clustering Based on Bio-inspired Model

Spatial Data Dynamic Balancing Distribution Method Based on the Minimum Spatial Proximity for Parallel Spatial Database

Internet Traffic Managers

A Semi-Distributed Load Balancing Architecture and Algorithm for Heterogeneous Wireless Networks

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Grading Image Retrieval Based on DCT and DWT Compressed Domains Using Low-Level Features

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Game Based Virtual Bandwidth Allocation for Virtual Networks in Data Centers

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES

Real-Time Guarantees. Traffic Characteristics. Flow Control

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm

A Concurrent Non-Recursive Textured Algorithm for Distributed Multi-Utility State Estimation

Non-Split Restrained Dominating Set of an Interval Graph Using an Algorithm

3D Virtual Eyeglass Frames Modeling from Multiple Camera Image Data Based on the GFFD Deformation Method

An Image Compression Algorithm based on Wavelet Transform and LZW

Solving two-person zero-sum game by Matlab

Buffering High-Speed Packets with Tri-Stage Memory Array and Its Performance Analysis

UB at GeoCLEF Department of Geography Abstract

Priority-Based Scheduling Algorithm for Downlink Traffics in IEEE Networks

DEVELOPMENT AND RESEARCH OF OPEN-LOOP MODELS THE SUBSYSTEM "PROCESSOR-MEMORY" OF MULTIPROCESSOR SYSTEMS ARCHITECTURES UMA, NUMA AND SUMA

Transcription:

JOURNAL OF COMPUTERS, VOL. 4, NO. 9, SEPTEMBER 2009 873 Improved Resource Allocaton Algorthms for Practcal Image Encodng n a Ubqutous Computng Envronment Manxong Dong, Long Zheng, Kaoru Ota, Song Guo School of Computer Scence and Engneerng, The Unversty of Azu Azu-Wakamatsu 985-8580, Japan Emal: mx.dong@eee.org, {m5112105, d8102104, sguo}@u-azu.ac.jp Mny Guo, L L Department of Computer Scence and Engneerng, Shangha Jao Tong Unversty Shangha, 200030, Chna Emal: {guo-my, lljp}@cs.sjtu.edu.cn Abstract As a case study of the ubqutous computng system, we have mplemented a prototype for the JPEG encodng applcaton. In order to acheve ths eventual development n the real world, we studed resource allocaton polces that can mprove the overall performance of the system. In ths paper, we consder those statc and dynamc allocaton approaches and then propose four dfferent allocaton algorthms. In partcular, we extensvely studed the dynamc allocaton algorthms by explorng varous cache polces whch nclude dsabled cache, unrestrcted cache and restrcted cache. Performance of these algorthms n large scale applcaton scenaro s also evaluated based on both the mproved prototype and a smulaton envronment. The expermental results show a sgnfcant performance mprovement acheved by the new proposed algorthms n terms of load balance, executon tme, watng tme and executon effcency. Index Terms Resource allocaton algorthm, cachng, pervasve computng I. INTRODUCTION The word ubqutous means an nterface, an envronment and a technology that can provde all benefts n a transparent manner anytme and anywhere [1]. Ubqutous computng s a concept that computng facltes are avalable everywhere n the real world [2]. In recent years, ubqutous devces such as RFIDs, sensors, cameras, T-engnes, and wearable computers have been consstently upgraded and have begun to play mportant roles n our daly lfe [3-5]. However, there are stll many techncal challenges to buld such applcatons that potentally exst n nearly every aspect of lves over nfrastructure-less networks. Olympus Future Creaton Laboratory and Unversty of Azu have conducted a collaboratve research on developng a general framework for the comng ubqutous socety, n whch a ubqutous computng scenaro named Ubqutous Mult-Processor (UMP), whch s supported by many heterogeneous processng nodes, has been extensvely studed. In order to evaluate the scalablty and performance of the heterogeneous multprocessor systems, a basc framework of multprocessor smulaton system has been mplemented based on a mult-way cluster [6] and a double-buffered communcaton model [7] has been ncorporated nto the system that can mprove the performance over 50%, n terms of communcaton speed, ndependent of varous types of ndvdual processors. We have extended the system and mplemented a ubqutous mult-processor network-based ppelne processng framework [8], at the hardware smulaton level, to support the development of hgh performance pervasve applcatons. As a specal case, the dstrbuted JPEG encodng applcaton has been successfully developed upon the proposed framework. The performance of ths practcal mage encodng applcaton has been evaluated n [9-10] and the optmal packet sze of the UMP network been found through experments n the UMP system. In order to further mprove the performance of ths applcaton for ts practcal deployment, we shall extend our prevous work [9-11] by explorng varous resource allocaton technques. In ths paper, we propose a group of resource allocaton algorthms and evaluate ther performance n terms of load balance of the Resource Router (RR), total executon tme, executon effcency and task watng tme (delay). The remander of ths paper s structured as follows. Secton 2 gves the archtecture of UMP system. Secton 3 dscusses the exstng resource allocaton algorthm and proposes three mproved algorthms. The mplementaton detals and performance evaluaton are shown n Secton 4. Secton 5 summarzes our fndngs and the drectons for the future work. II. AN OVERVIEW OF THE UBIQUITOUS MULTI-PROCESSOR SYSTEM The archtecture of our UMP system s llustrated n Fg. 1, n whch there are three types of nodes: Clent Node, Resource Router and Calculaton Nodes. As the

874 JOURNAL OF COMPUTERS, VOL. 4, NO. 9, SEPTEMBER 2009 Fgure1. The archtecture of the UMP system based on our current mplementaton moble termnals, Clent Nodes send task requests to the UMP system through a wreless network. The Resource Router s the gateway of the UMP system whch receves requests from the Clent Nodes and manages the correspondng tasks to be executed over the subnet. There exsts only one Resource Router n a subnet. Each task can be decomposed nto steps, each of whch s executed on a specfc Calculaton Node n the subnet, such that the whole task can be accomplshed by a set of Calculaton Nodes that are cooperated and organzed n a sequental manner. Varous servces/tasks thus can be supported by dfferent executon sequences of the Calculaton Nodes. As a case study of the UMP system, we mplemented a prototype for the applcaton of the JPEG encodng [8], whch s to convert btmap format mage to JPEG format mage wth sx steps. They are readng btmap fle, RGB to YCbCr, down samplng translator, processng Dscrete Cosne Transform, Huffman Encodng, and JPEG mage wrter. At the begnnng stage of the mplementaton, the schedulng algorthm s not the major optmzaton ssue because t has lttle performance effect n a small-scale task request scenaro. As we extend ths prototype to the real-world JPEG encodng applcaton, n whch many users would request the tasks to the UMP system smultaneously, some mproved schedulng algorthms should be carefully desgned to acheve good performance, e.g. load balancng, hgh executon effcency and short watng and executon tme. III. RESOURCE ALLOCATION A LGORITHMS A. A Prelmnary Algorthm In ths paper, we use JPEG encodng as an archetypal example to test the proposed algorthms. There are sx stages to encode a btmap fle nto JPEG mage. When a user requests a task of JPEG encodng, the RR wll frst reserve sx PEs as a chan for the whole processng. After the user connected to the frst PE, the chan processng wll be started. When the last PE fnshed ts sub-task, the user can get the result and the RR change the entre PEs chan to a standby status. Due to the user sde s assumed as a moble clent, the battery lfe-tme s a very mportant factor n the system desgn. To reduce the energy consumpton of user sde, we fx the frst PE and the last PE to provde the frequently access from user to search the last PE. Thus, all the optmzaton process s effect to the mddle PEs n the whole process chan. Therefore, the algorthm can be descrbed as follows. Statc Allocatng Algorthm (SA). When task comes, RR wll reserve the whole PEs whch wll be needed to process the task untl the task s fnshed. Durng the processng tme, even some PEs are free, they cannot be used by other tasks. The characterstc of the current resource allocaton algorthm can be analyzed nto two parts: ) Mean delay: / 1 nm d m ( 1) t( nn/ mm) n/ mt n (1) 1 where m s the number of tasks RR can handle at one tme, t s the tme to handle m tasks. So the frst m tasks wat 0 tme, the second m tasks wat t tme, the -th m tasks should wat (-1)t tme. We can also get task executon effcency as follows: ) Task Executon Effcency: n n n1 f e e c (2), 1 1 1 1 Where e (1 n) s the executon tme n -th PE, c s the communcaton tme between j-th PE and j, j 1 (j+1)-th (1 j n) PE. In our smulaton, we assume the communcaton tme between any two PEs s the same,.e. c, j cm, n c, j, m, n N,N s the natural number set. Hence,

JOURNAL OF COMPUTERS, VOL. 4, NO. 9, SEPTEMBER 2009 875 n n f e e ( n1) c 1 1 (3) Abstract of the SA s descrbed as follows. (1) Router retreves a new task from the task queue. If there s no task n the task queue, then ends. (2) Router generates a PE chan whch s used to process a task. Set the PEs n the PE chan as busy, whch means these PE can not do any other task untl they are released. (3) Router sends the PE chan nformaton and task to PE1. (4) PE1 fnshes ts work and follows the PE chan nformaton to transfer the task to PE2. (5) PE2 fnshes ts work and follows the PE chan nformaton to transfer the task to PE3. (6) PE4 fnshes ts work and follows the PE chan nformaton to transfer the task to PE5. (7) PE5 fnshes ts work and follows the PE chan nformaton to transfer the task to PE6. (8) PE6 fnshes ts work and sends the processed task back to router. (9) Router sets all PEs n the PE chan as dle. (10) Remove the task from the task queue. If there s no task left n the task queue, then termnates. Otherwse go to Step (1). B. Improved Algorthms Dynamc Allocatng Algorthm (DA). The bggest lmtaton of the current polcy s that f the RR allocates the PEs to the users once, the all PEs are reserved untl the whole task wll be fnshed. Ths s obvously a bg useless of the computatonal resource. To regard as ths pont, we apply a randomly dstrbute algorthm to the UMP system. The concept of DA s after the PE fnshed the executon of the process, the PE wll ask the RR for the next phase of PE. The usage rate of PE s qute hgh, but the load balance s heavy for the RR. We can get task executon effcency as follows: ) Task executon effcency: n n n1 n f e e cr c (4) 1, 1 1 1 2 where e 1 n, s the executon tme, c 1, 2 n s the communcaton tme between -th PE and RR, cr 1 n s the communcaton tme between (-1)-th PE and -th PE. Abstract of the DA s descrbed as follows. (1) Router retreves a new task from the task queue. If there s no task n the task queue, then ends. (2) Router fnds an dle PE1 and any PE6 and then transfers the task to ths PE1. (3) After gettng the task from router, ths PE1 sends a busy status message to router. (4) After processng the task, ths PE1 send an dle status message to router and meantme ask the router for the next PE. (5) Router fnds an dle PE2, and then tells the PE1. (6) PE2 sends the status busy to router; PE1 transfers the task to PE2, and sends dle status message to router. (7) PE2, PE3, PE4 act the same. (8) After PE5 s processng the task, PE5 transfers the task the PE6 whch s decded by router at Step 2. (9) After PE6 s processng the task, transfer the processed task back to router. (10) Remove the task from the task queue. If there s no task left n the task queue, then termnates. Otherwse go to Step (1). Dynamc Allocatng Algorthm wth Cache Technology (DA-C). To mprove the DA, we ntroduce a cache concept of the resource allocatng algorthm. For every PE, we assgn a cache for them to memorze the next stage s PE. When they fnshed ther sub-task, they wll search the next phase of PE n ther cache. If the all PEs n the cache are at the busy status, t wll ask RR to assgn one free PE as the next phase PE. We can get task executon effcency as follows: ) The best case of task executon effcency: n n n f e e (3 c 1, ) 1 1 2 (5) where e 1 n s the executon tme, c 1, 2 n s the communcaton tme between (-1)-th PE and -th PE. The best case means each (-1)-th PE can access each -th PE memorzed n ther cache because -th PE s not busy. (-1)-th PE s supposed to have communcaton wth -th PE three tmes n total. As the frst communcaton, (-1)-th PE asks -th PE whether or not t s busy currently. Then, -th PE replays to (-1)-th PE n the second communcaton. Snce ths s the best case so that -th PE s answer must be avalable, (-1)-th PE starts to send data to -th PE n the thrd communcaton. ) The worst case of task executon effcency: n n n n1 f e e (3 c1, ) cr 1 1 2 1 (6) where e 1 n s the executon tme, cr 1 n 1 s the communcaton tme between -th PE and RR, c 1, 2 n s the communcaton tme between (-1)-th PE and -th PE. The worst case means each (-1)-th PE cannot access each -th PE memorzed n ther cache because -th PE s busy. Therefore, each (-1)-th PE has to ask RR for a free PE. Abstract of the DA-C s descrbed as follows. (1) Router retreves a new task from the task queue. If there s no task n the task queue, then ends. (2) Router fnds an dle PE1 and any PE6 and then transfers the task to ths PE1. (3) After gettng the task from router, ths PE1 sends a

876 JOURNAL OF COMPUTERS, VOL. 4, NO. 9, SEPTEMBER 2009 busy status message to router. (4) After processng the task, ths PE1 sends an dle status message to router. (5) Fnd the dle PE from ts cache. (6) If the PE1 fnds an dle PE2, then send a request message to verfy whether the PE2 s truly dle or not. (7) If the response from PE2 s yes, then go to step 10; or send a request message to router, by whch router wll look for an dle PE2 wthout restrcton of PE1 s cache. (8) If router fnds one, then tell PE1, otherwse, repeat steps from step 5. (9) Router fnds an dle PE2, and then tells the PE1. (10) PE2 send the busy status message to router; PE1 transfers the task to PE2, and then sends the dle status message to router. (11) PE2, PE3, PE4 act the same, besdes updates the nformaton of cache of PE1, PE2, PE3, respectvely. (12) After PE5 s processng the task, PE5 updates the nformaton of cache of PE4; and then transfers the task the PE6 whch s decded by router at Step 2. (13) After PE6 s processng the task, transfer the processed task back to router. (14) Remove the task from the task queue. If there s no task left n the task queue, then termnates. Otherwse go to Step (1). Dynamc Allocatng Algorthm wth Restrcted Cache (DA-RC). We ntroduce a restrct cache concept to DA-C. The dfference between the DA-RC and DA-C s the restrcton of jumpng to the PEs whch s out of the cache that current PE has. That means when a certan PE fnshed ts sub-task and the whole PEs n the cache are busy, the PE wll not ask RR to assgn a free PE out of the cache. For example, assume every cache at each phase has four PEs. When a certan PE at one stage fnshed the sub-task, t can search the next phase PE n the same cache. If the next phase PE s all busy status, t has to wat. Ths s the bggest dfference between DA-C and DA-RC. And the Effcency of DA-RC n the best case s the same to DA-C. Here, the best case means each (-1)-th PE succeeds to fnd the next phase PE n only one access wthout askng all PE members n the cache. ) The worst case of task executon effcency: where n n n l f e e 2c c e 1, j 1, 1 1 2 j1 (7) 1 n s the executon tme, c 1, 2 n s the communcaton tme between (-1)-th PE and -th PE, c 1, j 2 n, 1 j l where l s the number of PEs n one group, s also the communcaton tme between (-1)-th PE and -th PEs n the group. The worst case means each (-1)-th PE asks all next phase PEs n a group at every stage. It s because (-1)-th PE checks whether or not PE n the group s busy one by one n order to seek one avalable PE. Abstract of the DA-RC s descrbed as follows. (1) Router retreves a new task from the task queue. If there s no task n the task queue, then ends. (2) Router fnds an dle PE1 and any PE6 and then transfers the task to ths PE1. (3) After gettng the task from router, ths PE1 sends a busy status message to router. (4) After processng the task, ths PE1 sends an dle status message to router. (5) Fnd the dle PE from ts cache. (6) If the PE1 fnds an dle PE2, then send a request message to verfy whether the PE2 s truly dle or not. (7) If the response from PE2 s yes, then go to step 10; or wats a partcular tme, then go to step 5. (8) PE1 transfers the task to PE2. (9) PE2, PE3, PE4 act the same, besdes updates the nformaton of cache of PE1, PE2, PE3, respectvely. (10) After PE5 s processng the task, PE5 updates the nformaton of cache of PE4; and then transfers the task the PE6 whch s decded by router at Step 2. (11) After PE6 s processng the task, transfer the processed task back to router. (12) Remove the task from the task queue. If there s no task left n the task queue, then termnates. Otherwse go to Step (1). IV. PERFORMANCE EVALUATION AND DISCUSSION A. Detal of the Implementaton We bult a smulaton system to evaluate the four algorthms. Theoretcally, the number of tasks arrvng at the UMP system durng each perod s a random number wth an upper bound from 8 to 60 and a lower bound 0. We smulates 50 perods. Therefore, on average, the total number of tasks s about from 200 to 1500. Nevertheless, n our mplementaton, we ran the smulaton n whch the number of tasks s from 200 to 1800 wth about every 100 nterval. The number of PE was set as 144. Because the JEPG encodng needs 6 steps to process, each task needs sx PEs; therefore the total chans of PEs are 24. We also set the network delay n whch a PE or Router sends a request or gets a response as 20, and the network delay n whch a PE or Router send or receve the raw JEPG as 200. Table 1 shows the envronment of the smulaton. OS TABLE I THE EXPERIMENT ENVIRONMENT Wndows Vsta Busness (32-bt) CPU AMD Athlon64 3200+ Memory DDR SDRAM 2GB Language JAVA 1.5 Network Localhost

JOURNAL OF COMPUTERS, VOL. 4, NO. 9, SEPTEMBER 2009 877 The SA algorthm The DA algorthm The DA-C algorthm The DA-RC algorthm Fgure 2. The workload of RR of varous algorthms B. Smulaton Results and Dscusson We use three dmensonal fgures to provde an overall aspect n terms of number of tasks, executon tme and resource workload. Here, we defne the router s workload as that every tme each PE sends a request to the RR, we count the workload as 1. Fg. 2 shows the workload of RR from the smulaton results. Left axs ndcates the numbers of tasks and rght axs ndcates the executon tme. Vertcal axs shows RR s workload. The workload of SA s obvously small than DA and DA-C because once the RR assgn the PE to execute the task, t wll never communcate wth PEs. But when we focus on the total executon tme, SA performs the worst result. We can fnd ts executon tme s almost four tmes comparng to the other three algorthms. Even the shape of the red lnes of DA, DA-C and DA-RC n these pctures are alke, we can easy to know that by usng the cache technology, DA-RC performs an extremely good result than DA and DA-C and nearly close to the SA n router workload. DA-RC s router workload s only 12.5% of DA and DA-C s. Fg. 3 whch s the two dmensons of vew of router workload also shows the sgnfcance mprovement of DA-RC. It s nature that the DA had bad result, because almost every tme the PE should ask RR to know the next phase PE whch should be connected to. In Fg. 4, task executon effcency s hghly related wth the watng tme, SA shows the worst result wth the reason that t has to wat the executon to start even there are free PEs n the process chan. We can see the executon effcency of DA-RC has 12.6% better than DA. Also, the curves of DA-C and DA-RC are almost the same and they are exactly matchng the mathematc model we have descrbed n the above secton. Delay (watng tme) s an mportant factor n the real world system. Supposed even the total executons tme of the system s good, but f the delay s huge, the system stll cannot be well used by users. It s very clear that the average of delay of SA s extremely large because the

878 JOURNAL OF COMPUTERS, VOL. 4, NO. 9, SEPTEMBER 2009 Fgure 3. Algorthm VS (Router Workload) Fgure 4. Algorthm VS (Average Executon Effcency) Fgure 5. Algorthm VS (Average Delay) Fgure 6. Algorthm VS (Executon Tme) executon procedure s almost the sequental. Hence, we omt t n Fg. 5 to prevent ts negatve nfluence on other curves of the three algorthms. The average delay of SA s 60194, DA s 9189, DA-C s 4118 and DA-RC s 3540. From the fgure, we can fnd DA-RC and DA-C mproved much better performance than DA n terms of average delay tme. The reason why DA-RC s slghtly better than DA-C can be consdered that n DA-RC the waste of the fal communcaton tme s omtted. DA-C and DA-RC show good performance agan n Fg. 6. From the fgure, we can know DA boost the executon tme n an exponental manner. It s hard to accept ths algorthm for practcal usage. On the other hand, DA-C and DA-RC reman slow ncreasng even the number of task becomes larger. Comparng to the algorthms wth each other, DA-RC s the 15% better than DA-C at the number s 1700. It s obvous that DA-RC wll overwhelmed the DA-C when the scale of the system goes bgger and bgger. Fg. 7 to 10 are the results comparng the cases when sze of cache s 1, 2 and 4. We can know the executon effcency and delay tme s always good when the cache sze s 4. From Fg. 7, the Router Workload s the same and at the early stage of the number of tasks of Fg. 8, 9 and 10, we can see the performance s almost the same. However, as the number of tasks grows, the 4-sze-cache case s showng a better result than the other two cases. V. CONCLUSION AND FUTURE WORK As the further step of our prevous works, we studed resource allocaton polces that can mprove the overall performance of UMP system. In ths paper, we consdered those statc and dynamc allocaton approaches and then proposed four dfferent allocaton algorthms. We evaluated the performance wth four dfferent resources allocatng algorthms and analyzed these algorthms from four ponts of vew n terms of load balance, executon tme, watng tme and executon effcency. Through these extensve experments, we have successfully valdated our proposed algorthms acheved sgnfcant performance mprovement. We found the Dynamc Allocatng Algorthm wth Restrcted Cache s the best algorthm to allocate resources (PEs) under condton that the system has many users and many tasks to deal wth. And the next better algorthm s DA-C,

JOURNAL OF COMPUTERS, VOL. 4, NO. 9, SEPTEMBER 2009 879 Fgure 7. Router Workload under varous cache szes Fgure 8. Executon Tme under varous cache szes Fgure 9. Average Delay Tme under varous cache szes Fgure 10. Average Executon Effcency under varous cache szes followed by DA. Furthermore, through the experments we have realzed that we can set the allocatng polcy flexbly to answer the users requests. In the future, we shall consder the mult-layer archtecture of the resource router to reduce the workload and also focus on the upper layer of the UMP system - the servce layer, to deal wth the context nformaton from users, resources and envronments. ACKNOWLEDGEMENT Ths work s supported by Future Creaton Lab., Olympus Corp, Research Fellowshps of the Japan Socety for the Promoton of Scence for Young Scentsts Program, the Natonal Hgh-Tech Research and Development Plan of Chna (863 Plan) under Grant Nos. 2008AA01Z106, the Natonal Natural Scence Foundaton of Chna under Grant Nos. 60811130528, 60725208, and 60533040, and Shangha Pujang Plan No. 07pj14049. The authors are deeply grateful to Mr. Deze Zeng, Mr. Gongwe Zhang and Mr. Peng L n the Performance Evaluaton Laboratory at the Unversty of Azu. REFERENCES [1] M. Weser, The Computer for the 21st Century, IEEE Pervasve Computng, pp. 19-25, January-March 2002. [2] Wkpeda, http://ja.wkpeda.org/wk/ [3] M. Satyanarayanan, Pervasve Computng: Vson and Challenges, IEEE Personal Communcaton, pp. 10-17, August 2001. [4] S. R. Ponnekant, et.al, Icrafter: A servce framework for ubqutous computng envronments, n Proc. of Ubcomp 2001, pp. 56 75, Atlanta, Georga, October 2001. [5] V. Stanford, Usng Pervasve Computng to Delver Elder Care, IEEE Pervasve Computng, pp. 10-13, January-March, 2002. [6] A. Shnozak, M. Shma, M. Guo, and M. Kubo, A Hgh Performance Smulator System for a Multprocessor System Based on a Mult-way Cluster, Advances n Computer Systems Archtecture, Lecture Notes n Computer Scence vol. 4186, pp. 231-243, Sprnger Berln/Hedelberg, September, 2006. [7] A. Shnozak, M. Shma, M. Guo, and M. Kubo, Multprocessor Smulator System Based on Mult-way Cluster Usng Double-buffered Model, n Proc. of IEEE AINA 2007, pp. 893-900, Nagara Falls, Canada, May 2007. [8] M. Kubo, B. Ye, A. Shnozak, T. Nakatom and M. Guo, UMP-PerComp: A Ubqutous Multprocessor Network-Based Ppelne Processng Framework for Pervasve Computng Envronments, n Proc. of IEEE AINA 2007, pp. 611-618, Nagara Falls, Canada, May 2007. [9] M. Dong, S. Guo, M. Guo and S. Watanabe, Desgn of the Ubqutous Mult-Processor System Focusng on Transmsson Data Sze, n Proc. of HPSRN, pp. 158-166,

880 JOURNAL OF COMPUTERS, VOL. 4, NO. 9, SEPTEMBER 2009 Senda, Japan, March 2008. [10] M. Dong, S. Watanabe, and M. Guo, Performance Evaluaton to Optmze the UMP System Focusng on Network Transmsson Speed, n Proc. of FCST, pp. 7-12, Wuhan, Chna, November 2007. [11] M. Dong, M. Guo, L. Zheng, S. Guo, Performance Analyss of Resource Allocaton Algorthms Usng Cache Technology for Pervasve Computng System, n Proc. of ICYCS 2008, pp. 671-676, Zhang Ja Je, Chna, November 2008. Manxong Dong receved the B.S. and M.S. degree both n computer scence and engneerng from the Unversty of Azu, Japan, n 2006 and 2008 respectvely. He s currently a Ph.D. student and a JSPS (Japan Socety for the Promoton of Scence) Research Fellow at School of Computer Scence and Engneerng, the Unversty of Azu, Japan. From January 2007 to March 2007, he was a vstng scholar of West Vrgna Unversty, USA. Hs research nterests nclude pervasve computng, sensor networks, and ubqutous-learnng. Long Zheng receved the B.S n computer scence and technology from Huazhong Unversty of Scence and Technology, Chna, n 2006. He s now a Master student at School of Computer Scence and Engneerng, the Unversty of Azu, Japan. Hs research nterests nclude chp multprocessor (CMP), pervasve computng and Peer-to-Peer meda streamng. Conference on Embedded and Ubqutous Computng (EUC). He s the edtor-n-chef of the Journal of Embedded Systems. He s also on the edtoral board of the Journal of Pervasve Computng and Communcatons, the Internatonal Journal of Hgh Performance Computng and Networkng, the Journal of Embedded Computng, the Journal of Parallel and Dstrbuted Scentfc and Engneerng Computng, and the Internatonal Journal of Computer and Applcatons. Professor Guo receved the Natonal Scence Fund of Chna (NSFC) for Dstngushed Young Scholars n 2007, and s also the PI of the NSFC Key Project Theoretcal and Techncal key ponts of Pervasve Computng. Dr. Guo s research nterests nclude parallel and dstrbuted processng, parallelzng complers, pervasve computng, embedded systems software optmzaton, and software engneerng. He s a senor member of the IEEE and the IEEE Computer Socety, and a member of the ACM, IPSJ, CCF, and IEICE. L L receved the M.E. degree n computer scence and engneerng from the Unversty of Azu, Japan n 2005. Her employment experence ncluded the department of nformaton physcs of Nanjng Unversty, Chna, the natonal nsttute for envronmental studes Tsukuba, Japan, respectvely. L L has worked for the school of software of Shangha Jao Tong Unversty as engneer snce 2006. Her research nterests nclude pervasve computng, sensor networks, and ubqutous-learnng. Kaoru Ota receved the B.S. degree n computer scence and engneerng from the Unversty of Azu, Japan, n 2006 and M.S. degree n computer scence at Oklahoma State Unversty, USA, n 2008. She s currently a Ph.D. student at School of Computer Scence and Engneerng, the Unversty of Azu, Japan. Her current nterests of research are localzaton and trackng by usng moble agents n wreless sensor networks. Song Guo receved the PhD degree n computer scence from the Unversty of Ottawa, Canada n 2005. He then held a poston wth the Unversty of Brtsh Columba on an NSERC (Natural Scences and Engneerng Research Councl of Canada) postdoctoral fellowshp. From 2006 to 2007, he was an Assstant Professor at the Unversty of Northern Brtsh Columba, Canada. He s currently an Assstant Professor at School of Computer Scence and Engneerng, the Unversty of Azu, Japan. Hs research nterests are n the areas of protocol desgn and performance analyss for communcaton networks, wth a specal emphass on wreless ad hoc and sensor networks for relable, energy-effcent, and cost effectve communcatons. Mny Guo receved the PhD degree n computer scence from Unversty of Tsukuba, Japan. Before 2000, Dr. Guo had been a research scentst of NEC Corp., Japan, and a professor n the School of Computer Scence and Engneerng, The Unversty of Azu, Japan. Currently, Dr. Guo s a dstngushed char professor of the Department of Computer Scence and Engneerng, Shangha Jao Tong Unversty, Chna and an adjunct professor at the Unversty of Azu. He s also a guest professor at Nanjng Unversty, Huazhong Unversty of Scence and Technology, and Central South Unversty, Chna. Dr. Guo has publshed more than 160 research papers n nternatonal journals and conferences. Dr. Guo has served as general char, program commttee, or organzng commttee char for many nternatonal conferences. He s the founder of the Internatonal Conference on Parallel and Dstrbuted Processng and Applcatons (ISPA) and the Internatonal