A Parallelization Design of JavaScript Execution Engine

Similar documents
Parallelism for Nested Loops with Non-uniform and Flow Dependences

Cluster Analysis of Electrical Behavior

Loop Transformations, Dependences, and Parallelization

An Optimal Algorithm for Prufer Codes *

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Solving two-person zero-sum game by Matlab

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Support Vector Machines

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Concurrent Apriori Data Mining Algorithms

BRDPHHC: A Balance RDF Data Partitioning Algorithm based on Hybrid Hierarchical Clustering

Programming in Fortran 90 : 2017/2018

The Research of Support Vector Machine in Agricultural Data Classification

The Codesign Challenge

CMPS 10 Introduction to Computer Science Lecture Notes

Assembler. Building a Modern Computer From First Principles.

Efficient Distributed File System (EDFS)

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations*

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

A fast algorithm for color image segmentation

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Load Balancing for Hex-Cell Interconnection Network

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Network Intrusion Detection Based on PSO-SVM

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

Module Management Tool in Software Development Organizations

S1 Note. Basis functions.

Meta-heuristics for Multidimensional Knapsack Problems

A Binarization Algorithm specialized on Document Images and Photos

Two-Stage Data Distribution for Distributed Surveillance Video Processing with Hybrid Storage Architecture

Analysis on the Workspace of Six-degrees-of-freedom Industrial Robot Based on AutoCAD

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

An Improved Image Segmentation Algorithm Based on the Otsu Method

An IPv6-Oriented IDS Framework and Solutions of Two Problems

Remote Sensing Image Retrieval Algorithm based on MapReduce and Characteristic Information

Intra-Parametric Analysis of a Fuzzy MOLP

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Improved Resource Allocation Algorithms for Practical Image Encoding in a Ubiquitous Computing Environment

High-Boost Mesh Filtering for 3-D Shape Enhancement

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

Wavefront Reconstructor

UB at GeoCLEF Department of Geography Abstract

Maintaining temporal validity of real-time data on non-continuously executing resources

LLVM passes and Intro to Loop Transformation Frameworks

Application of Improved Fish Swarm Algorithm in Cloud Computing Resource Scheduling

The Shortest Path of Touring Lines given in the Plane

ENERGY EFFICIENCY OPTIMIZATION OF MECHANICAL NUMERICAL CONTROL MACHINING PARAMETERS

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Parallelization of a Series of Extreme Learning Machine Algorithms Based on Spark

Related-Mode Attacks on CTR Encryption Mode

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce

A new segmentation algorithm for medical volume image based on K-means clustering

Research of Dynamic Access to Cloud Database Based on Improved Pheromone Algorithm

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation

Parallel Artificial Bee Colony Algorithm for the Traveling Salesman Problem

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

Application of Clustering Algorithm in Big Data Sample Set Optimization

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011

Machine Learning. Topic 6: Clustering

Journal of Chemical and Pharmaceutical Research, 2014, 6(10): Research Article. Study on the original page oriented load balancing strategy

Space-Optimal, Wait-Free Real-Time Synchronization

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Simulation Based Analysis of FAST TCP using OMNET++

Fast Computation of Shortest Path for Visiting Segments in the Plane

An Image Compression Algorithm based on Wavelet Transform and LZW

HCMX: AN EFFICIENT HYBRID CLUSTERING APPROACH FOR MULTI-VERSION XML DOCUMENTS

Chinese Word Segmentation based on the Improved Particle Swarm Optimization Neural Networks

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Parallel matrix-vector multiplication

On Some Entertaining Applications of the Concept of Set in Computer Science Course

X- Chart Using ANOM Approach

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals

F Geometric Mean Graphs

A Resources Virtualization Approach Supporting Uniform Access to Heterogeneous Grid Resources 1

PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES

Smoothing Spline ANOVA for variable screening

Virtual Machine Migration based on Trust Measurement of Computer Node

Available online at Available online at Advanced in Control Engineering and Information Science

Optimal Design of Nonlinear Fuzzy Model by Means of Independent Fuzzy Scatter Partition

A Clustering Algorithm for Key Frame Extraction Based on Density Peak

Performance Assessment and Fault Diagnosis for Hydraulic Pump Based on WPT and SOM

Transcription:

, pp.171-184 http://dx.do.org/10.14257/mue.2014.9.7.15 A Parallelzaton Desgn of JavaScrpt Executon Engne Duan Huca 1,2, N Hong 2, Deng Feng 2 and Hu Lnln 2 1 Natonal Network New eda Engneerng Research Center, Insttute of Acoustcs, Chnese Academy of Scences, Beng 10190, Chna 2 Unversty of Chnese Academy of Scences, Beng 100049, Chna duanhc@dsp.ac.cn Abstract Wth more and more consumer electroncs apply mult-core chps, the tradtonal seralzed JavaScrpt executon engne, whch s optmzed by ust-n-tme (JIT) complaton technology, fals to utlze mult-core advantages. Ths paper proposes a mathematcal model to detect the dependency of seral JavaScrpt tasks and a parallelsm executon algorthm for seral JavaScrpt executon engnes. oreover, the parallel JavaScrpt executon engne wth thread-level speculaton technology s mplemented based on the SqurrelFsh Extreme engne of WebKt. As the experment were conducted respectvely on the general test platform platform Sunspder n the ndustry and world top 15 webstes at traffc volume, the results ndcate that both n the real Web applcaton and Sunspder platform, the parallel JavaScrpt executon engnes wth 2 to 16 threads can rase the performance dramatcally compared wth a SqurrelFsh executon engne wth or wthout JIT acceleraton, respectvely. Keywords: ult-core chps, parallel computaton, thread-level speculaton, JavaScrpt 1. Introducton JavaScrpt s a dynamc nterpretaton and executon language orented to obects. As JavaScrpt acts as man scrptng language n the clent s Web applcaton, ts executon performance drectly affects the user s experence related to Web applcatons. The tradtonal soluton of JavaScrpt acceleraton s JIT complaton technology [1-4] whch generate machne codes from runnng JavaScrpt codes for executon. JIT acceleraton technology may ncrease the complaton expenses [4], provded that the optmzed code s executed agan and the type of JavaScrpt obects s not changed etc. JIT optmzaton technology promotes JavaScrpt performances conventonally through experments on the specfc test platform [5, 6], but the actual features of Web applcatons are dfferent from the test platform of ndustral crcles [79]. JIT optmzaton technology usually fals to reduce the JavaScrpt executon tme n common Web applcatons accordng to some researches [10]. oreover, as more and more consumer electroncs use mult-core chps, JavaScrpt executon engne apples seralzaton operaton wth JIT acceleraton technology and fals to take advantage of mult-core chps. Accordng to some research results [11], the JavaScrpt executon engne under seralzaton operaton s parallelzed so that the performance of Web applcatons s rased by 45 tmes. To reduce the dffculty of parallel complaton, t s proposed n [12-14] that the technology of Thread-Level Speculaton (TLS) [15] should be appled to parallelze JavaScrpt engne and realze JavaScrpt functon-level parallelsm. However, the parallelsm at functon level cannot explore the parallelsm potental of JavaScrpt applcaton to the maxmum extent 1 Proect supported by the Natonal Scence and Technology Support Program of Chna (No.2012BAH73F01) and CAS plot specal ssue (No. XDA06040501). ISSN: 1975-0080 IJUE Copyrght c 2014 SERSC

whle a large number of loop operatons n the actual JavaScrpt applcaton can apply fnegraned loop level to partton parallelsm. Ths paper analyzes the factors for JavaScrpt functon-level or loop-level parallelsm, proposes the detecton approach of data dependency n JavaScrpt byte codes and further dscusses an algorthm whch dvde seralzed JavaScrpt program nto tasks for parallel executon. eanwhle, based on SqurrelFsh Extreme, t also proposes a parallelsm soluton of byte-code nterpreters. The algorthm hereof mproves performances obvously after beng tested on world top 10 webstes at traffc volume. 2. Feasble Analyss of Parallel JavaScrpt Executon Engne 2.1. Prncple Analyss of JavaScrpt Executon Engne The conventonal JavaScrpt executon engne apples seral executon and translates JavaScrpt codes nto byte codes for nterpretaton and executon [1, 3] or drectly nto machne codes for executon [2]. Fgure 1 shows the executon processes for three common types of JavaScrpt executon engnes. Hydrogen JavaScrpt AST JavaScrpt AST ByteCode load r0, t1 add r1, t1 Natve Code Exst? Natve Code Yes B8 84 C0 AF Natve Code 08 B5 C3 D5 AE 0F Full CodeGen Crankshaft Lthum LaddI t1, t2, 7 LodI t5, t1, 2 NO Interpret ByteCode (a) V8 Complaton Ppelne (b) Bytecode Interpreter wth Just-In-Tme Compler Fgure 1. Executon Processes of JavaScrpt Executon Engnes As shown n Fgure 1, t ndcates the flow dagrams about how manstream JavaScrpt executon engnes nterpret and execute JavaScrpt programs at present. (a) ndcates V8 s executon process, V8 converts JavaScrpt source codes nto abstract syntax trees and ntates a Full-Codege compler to comple the abstract syntax trees nto machne codes related to the platform and execute them one by one; t ntates a Crankshaft compler to optmze hotspot functons n the runnng process and generate better machne codes for executon. (b) s a common JavaScrpt executon engne wth the combnaton of a byte code nterpreter and a JIT compler, where JavaScrpt source codes are converted nto abstract syntax trees. Dfferent from V8, ths engne generate a knd of abstract byte codes whle an nterpreter nterprets and executes these codes. eanwhle, a JIT compler comples hotpots of byte codes nto machne codes related to the platform to optmze performances. As both SqurrelFsh Extreme and Traceonkey apply ths desgn, the dfference les n optmzed gran szes. And the former apples the hotspot optmzaton algorthm based on ethod whle the latter uses that based on Trace [16]. Therefore, the current JavaScrpt executon engne can execute JavaScrpt codes only accordng to the sngle-threaded sequence but fals to speed up wth the advantages of mult-core chps. 172 Copyrght c 2014 SERSC

1.2. Parallelsm Instances of JavaScrpt Executon Engne All parts of the JavaScrpt program always have a certan dependency relatonshp. The parallel processng shall decompose the program nto several tasks for parallel executon wthout causng damage to these dependency relatonshps n order to shorten the tme requred for runnng the whole process. However, there s no general model for the parallelsm of a seral program at present, so t s necessary to take the features of seral tasks n consderaton for partton. As there are a lot of functons n the JavaScrpt program, t s possble to perform parallel executon of callee and caller functons durng the functon call. oreover, there are a lot of loop teratve operatons n the JavaScrpt codes and each teratve operaton may be also executed n a parallel manner. Fgure 2 ndcates an example of creatng JavaScrpt functon, whle JavaScrpt source codes stay on the left and the correspondng byte-code pseudo-code on the rght. In ths fgure, the sequence of byte codes s parttoned nto 13 tasks. Obvously, Tasks 2 and 3 are mutually ndependent and so are Tasks 3 and 4. The executon must be carred out after Task 1. And Tasks 4-12 are the bytecode sequence after JavaScrpt loop unrolls and every round of teraton s based on the former round. The loop teratve operaton cannot be executed n a parallel manner whle Task 13 cannot be executed untl all other tasks are completed. functon f() { var = 0; g(); ++; for (var = 1; < 5;++) { += ; } return ; } functon g() { return 1; } 1 2 3 4 5 6 7 8 9 10 11 12 13 set = 0 call g return 1 ncrement set = 1 test loop condton < 5 += ncrement test loop condton < 5 += ncrement test loop condton < 5 += ncrement test loop condton < 5 += ncrement test loop condton < 5 return Fgure 2. Example of JavaScrpt Source Code and ts Correspondng Byte- Code Sequence Fgure 3 ndcates the sequence chart of the parallelsm executon for the tasks n Fgure 2. Frst, execute Task 1 n the man thread. When t s tme to execute Task 2, open a thread to execute Task 2 because Tasks 2 and 3 are mutually ndependent. As the man thread proceeds to other byte codes after Task 2, t s tme to execute Task 3. And then, open another thread to execute Task 3 due to mutual ndependence between Tasks 3 and 4. As the man thread proceeds to other byte codes after Task 3, kck off Task 4. The executon of Task 5 reles on Copyrght c 2014 SERSC 173

Tasks 3 and 4, so Task 4 must be executed n seres by the man thread at ths tme. After the completon of Task 4, the man thread must not execute Task 5 untl the completon of Task 3. As Task 3 s completed, t sends feedback to the man thread and the man thread executes Task 5 and ts subsequent tasks n order. Therefore, comparng wth seral executon, parallel executon can reduce the total tme ndcated n Equaton (1): Thread 1 Thread 2 Thread 3 Task 1 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 Task 10 Task 11 Task 12 Task13 Task 2 Task 3 Tme Spent n creatng new Threads Tme Spent n watng for Task3 fnsh before Task5 start Fgure 3. Sequence Chart of JavaScrpt Parallel Executon T T T T (1) t2 t3 thread wat 5 Where, T t 2 refers to the tme consumed for Task 2 executon, T t3 ndcates the tme consumed for Task 3 executon, T thread means the tme consumed for openng Threads 2 and 3 whle T wat5 ndcates the tme consumed to wat for the completon of Task 3 before the man thread executes Task 5, t can be expressed n Fgure 3 as follows: Twat 5 Tt 3 Tt 4 (2) Therefore, the tme reduced for parallel executon can be expressed as follows: Tt 2Tt 4 Tthread (3) Because Tasks 2 and 4 are executed n a parallel manner, the tme consumed for the two tasks s reduced, comparng wth seral executon but addtonal tme s consumed for openng the thread. Provded the tme consumed for seral executon of Tasks 2 and 4 s longer than that for openng the thread, the parallelzaton can mprove the performances of JavaScrpt program. In the actual program, the expenses for task executon s much more than the tme consumed for openng the thread, so maxmzng the parallel executon of JavaScrpt program can mprove performances and t s feasble to parallelze the seral JavaScrpt program. 2. Parallel algorthm of JavaScrpt Executon Engne The key to parallel executon of JavaScrpt executon engne s to dentfy the dependency relatonshp n JavaScrpt tasks. Under the premse of no damage to the dependency relatonshp, a JavaScrpt program s parttoned nto several subtasks for parallel executon. The dependency relatonshps can be categorzed nto the two types of control dependency relatonshp and data dependency relatonshp whle the former leads to program process changes and the latter s ncurred by readng/wrtng the same data. And the data dependency can affect the dependency relatonshp of a JavaScrpt program. Therefore, the analyss theory and technques of data dependency relatonshps s the bass of JavaScrpt program parallelsm. 2.1. athematcal odel Analyss of JavaScrpt Task Dependency P ( 1,2,, m) s defned as one task n the JavaScrpt program and V ( 1,2,, n) as a varable. As dfferent tasks carry out readng/wrtng operatons on the same varable, there 174 Copyrght c 2014 SERSC

may be data dependency relatonshps. mn-order matrx r s defned as readng matrx wth the value as follows: 0, no readng Varable V n P, r (, ) (4) 1, readng Varable V n P. mn-order wrtng matrx s defned as: 0, no wrtng Varable V n P, w(, ) (5) 1, wrtng Varable V n P. There are three types of data dependency relatonshps: read-after-wrte (RAW), wrte-afterread (WAR) and wrte-after-wrte (WAW) [17]. mm-order matrx wr of JavaScrpt task read-wrte dependency s defned as T wr r w (6) Therefore: 0, no data RAW dependency between P and P wr (, ) (7) k(0 k n), k varables have RAW dependency between P and P. mn-order matrx ww of JavaScrpt task WAW dependency s defned as (8) T ww w w Therefore, 0, no data WAW dependency between P and P, ww(, ) k(0 k n), WAW dependency between P and P. mn-order dependency matrx d s defned as: d wr ww (10) And 0, no dependency betweenpand P, d (, ) (11) k (0<k 2n), dependency between Pand P. eanwhle, the dependency relatonshp s of transtvty; f d (, ) 0 and d (, l) 0, there s the dependency relatonshp between P and P l. The task can be executed n a parallel manner only f there s no dependency relatonshp whle the tasks under dependency relatonshps must be executed by the same thread n order. 2.2. Parallel algorthm of JavaScrpt Executon Engne C s defned as a set of tasks under executon or to be executed andc { P1, P2, P3,, P m }; every tme a new task P ( 1,2,, m) s kcked off, put the new task s put nto Set C ; and when a task s completed, delete t from C. C C,0 m s defned as some task under executon, Thread ( C ) s the thread where some task s under executon at that tme, P( Thread ( C )) ndcates the task set whch has (9) Copyrght c 2014 SERSC 175

dependency relatonshp wth C and s to be executed n the same thread wth C, and P( Thread ( C )) C. Index( P ) refers to the seral numbers of Task P n { P1, P2, P3,, P m } whle F { P1, P2,, P m } s a task to be executed. Ths paper proposes an algorthm for parallel executon of seral JavaScrpt tasks: Step 1: If C, execute F n the man thread and put F nto Set C, otherwse proceed to Step 2; Step 2: If d( Index( F), Index( C )) 0 to randomc C, execute F n the new thread and put F nto SetC, otherwse proceed to Step 3; Step 3: If d( Index( F), Index( C )) 0 to C C, put F n C and P( Thread ( C )), proceed to Step 4; Step 4: As the executon ofc s completed, delete C from C ; f P( Thread ( C )), Thread ( C ) goes on proceedng to the next C, C P( Thread ( C )) from P( Thread ( C )) and. eanwhle, delete C from P( Thread ( C )), otherwse complete Thread ( C ) and feed the results back to the man thread. And then proceed to Step 5; Step 5: The man thread skps F and analyzes the subsequent tasks, proceed to Step 1 f there s any task, otherwse wat for executon completon of all threads and return to executon results. 3. Algorthm Implementaton Ths paper apples thread-level speculaton technology and proposes a parallel algorthm soluton of the JavaScrpt executon engne based on the Squrrelfsh Extreme executon engne. 3.1. Algorthm for Seral JavaScrpt Task Parttonng For parallel executon of the JavaScrpt program, t s requred to dvde the seral program nto several tasks and fnd tasks from the task set for parallel executon. The thread-level speculaton s ntended to dynamcally parallelze the seral program whle there are the parallelsm gran-szes at loop level and method level. The loop-level parallelsm assgns one thread to teraton of each loop for executon whle the method-level parallelsm deploys each functon as one thread. Squrrelfsh Extreme executon engne frst translates JavaScrpt source codes nto byte codes [18], parttons seral JavaScrpt program nto several tasks n the process of byte-code generaton and establshes the task dependency matrx. Loop, loop_f_true and loop_f_less of Squrrelfsh Extreme byte codes ntate one-loop nstructons, ts target parameter s a start address of one loop-level task, the call nstructon s a functon call nstructon, ts func parameter s a start address of the functonlevel task and the ret nstructon s the symbol ndcatng callng by a functon and backng to results. 176 Copyrght c 2014 SERSC

Start Loop Target Loop f true call pop Loop Target Loop f false Task 1 Task 2 Task3 Task4 Task5 Task6 Task7 Fgure 4. Schematc Dagram of Task Parttonng for JavaScrpt Seral Program Fgure 5. Pseudo-Code Realzaton of Task Parttonng for JavaScrpt Seral Program Fgure 4 s the schematc dagram of task parttonng for the JavaScrpt seral program, t scans the byte-code sequence n order; every tme t meets wth a call functon nstructon or a loop teratve nstructon, record one new task and map the task number to the address parameter of the nstructon. eanwhle, byte-code fragments between teraton and functon call are parttoned nto separate tasks. Fgure 5 shows the pseudo-codes of the soluton. Copyrght c 2014 SERSC 177

3.2. Dependency atrx Computaton Fgure 6 shows the pseudo codes realzed by the algorthm; as JavaScrpt program s parttoned nto several tasks, t s possble to obtan r (,:) from the th row of the readng matrx and w(,:) from the th row of the wrtng matrx through all read/wrte varables n the speculaton task P( 1, 2,, m). The same operatons are carred out for all tasks to establsh the readng matrx r and the wrtng matrx w and obtan the task dependency matrx n combnaton wth Equatons (6)-(11). To reduce the memory consumpton, use d a sparse matrx durng desgn to store the data of r, w and d. Fgure 6. Pseudo-Code Computatons through JavaScrpt Task Dependency atrx 3.3. Parallel Executon of Seral JavaScrpt After the dependency matrx s obtaned, t s possble to udge whether two tasks can be executed n a parallel manner dependng on the matrx-to-matrx dependency value. Execute the tasks wth dependency relatonshps n the same thread and apply thread pool technology durng realzaton to reduce the expenses for creatng threads, whle the pseudo-code realzaton s shown n Fgure 7. 178 Copyrght c 2014 SERSC

Fgure 7. Pseudo Codes for JavaScrpt Parallel Executon 4. Performance Evaluaton Ths paper conducted the performance evaluatons respectvely from the two aspects of theory and experment and fully approved that the parallelzed JavaScrpt executon engne can sgnfcantly mprove the performances of JavaScrpt program. Copyrght c 2014 SERSC 179

4.1. Theoretcal Analyss Suppose that the seral JavaScrpt program can be parttoned nto several tasks, the executon tme of each task P ( 1, 2,, k) s T ( 1,2,, k) respectvely. Comparng wth the executon tme of each task, the expenses are omtted for allocatng threads because the thread pool s used to allocate threads. Therefore, the JavaScrpt executon engne wthout JIT acceleraton requres the tme k 1 T to execute k tasks. Suppose that the seral JavaScrpt program s fnally executed n parallel by d threads and each thread executes k ( 1, 2,, d) tasks, the JavaScrpt executon engne n the seralzed desgn requres the tme k max( T ) to execute k tasks. 1 For JavaScrpt executon engne wth JIT acceleraton, suppose that the executon tme for byte codes accelerated accounts for (0 1) 100% of the total tasks, the tme consumed for optmzng the part of codes accounts for (0 ) 100% of the total task executon tme and the executon effcency of JIT codes compled s f(0 f) tmes better than that of the orgnal byte codes, so the JavaScrpt executon engne wth JIT acceleraton k requres the tme (1 ) T to execute k tasks. f 1 Therefore, comparng wth the conventonal seral JavaScrpt executon engne wthout JIT acceleraton, the parallel executon engne can mprove the performances by: k T 1 k max( T ) 1 ( 1,2,, d) Obvously, t s proved from the theoretcal vew, the parallel JavaScrpt executon engne can effectvely mprove the performances by d tmes at the maxmum only f the seral JavaScrpt program can be parttoned nto separate tasks. eanwhle, the d value s related to the specfc JavaScrpt program whch can be proved by the subsequent tests. Comparng wth the seral JavaScrpt executon engne wth JIT acceleraton, the parallel executon engne hereof can mprove the performances by: (1 ) f k max( T ) 1 k 1 T ( 1,2,, d) Accordng to Pareto s 80/20 Law, 20% of the codes n a program take up 80% of the total runnng tme, so the value of α s set to 80% [19] n ths paper. And the average executon tme for SqurrelFsh Extreme byte codes s 10 tmes hgher than that for the machne codes [12, 18], so f s set to 10 and β to 20%. Equaton (13) s converted nto: (12) (13) 180 Copyrght c 2014 SERSC

0.48 k 1 k max( T ) 1 T ( 1,2,, d) The parallel executon effcency can sgnfcantly exceed that of JIT acceleraton only f the seral JavaScrpt program can be executed n parallel at the maxmum. Durng value settngs of Equaton (13), the value of Parameter depends on the seekng effcency of hotspots and ts subsequent executon frequency of ts JavaScrpt program hotspots whle the actual value of β s absolutely possble to exceed 1 [20]. No general theores can be used at present to carry out quanttatve analyss for mprovng the parallelzaton performances of seral JavaScrpt program, so t s requred to perform expermental statstcs for the common Web applcatons n order to analyze the performance mprovement due to parallel JavaScrpt executon engne. 4.2. Expermental Analyss (4) Fgure 8. The Comparatve Chart of Test Performances on World Top 15 Webstes at Traffc Volume The experment n ths paper was based on the embedded network set-top box and the tests were conducted on the general test platform Sunspder of 15 webstes wth dfferent busness types as well as ndustral felds. Fgure 8 shows the comparatve chart of performance tests on world top 15 webstes wth maxmum traffc volumes, where JIT acceleraton ndcates unobvous effects on the actual webstes, the average performances are rased by 5.43% and JIT acceleraton even reduces the performances on some webstes. Parallel JavaScrpt executon engne can sgnfcantly mprove performances; comparng wth seral JavaScrpt Copyrght c 2014 SERSC 181

executon engne wthout JIT acceleraton, the parallel JavaScrpt engnes wth 2, 4, 8 and 16 threads can mprove the average performances respectvely by 37.07%, 1.36 tmes, 4.08 tmes and 5.92 tmes; comparng wth the seral JavaScrpt executon engne wth JIT acceleraton, the parallel JavaScrpt executon engnes wth 2, 4, 8 and 16 threads can mprove the average performances respectvely by 30.01%, 1.24 tmes, 3.82 tmes and 5.57 tmes. Fgure 9 shows the test results on Sunspder platforms. The characterstcs of JavaScrpt codes better ft the expectaton of JIT optmzaton on Sunspder platforms, so t s usually used as performance test platform for JIT acceleraton. Fgure 9. the Comparatve Chart of Performance Tests on Sunspder Platforms On Sunspder platforms, comparng to the JavaScrpt executon engne wthout JIT acceleraton, JIT acceleraton can mprove the average performances by 1.62 tmes but Sunspder s features are obvously dfferent from the actual Web busness. Comparng to the JavaScrpt executon engne wthout JIT acceleraton, the parallel JavaScrpt executon engnes wth 2, 4, 8 and 16 threads can rase the average performances respectvely by 58.57%, 2.23 tmes, 4.97 tmes and 9.28 tmes; and comparng wth JavaScrpt seral executon engne wth JIT acceleraton, the parallel JavaScrpt executon engnes wth 2, 4, 8 and 16 threads can rase the average performances respectvely by -39.7%, 22.86%, 1.27 tmes and 2.91 tmes. Obvously, the characterstcs of Sunspder are dfferent from that of current manstream Web applcatons and the parallel JavaScrpt executon engnes wth the same degree of parallelsm have dfferent effects on performance promoton for two types of busnesses. For manstream Web applcatons, 16-thread parallelsm and 8-thread parallelsm both have unobvous performance mprovement, but the applcaton of 16 threads on Sunspder nearly doubles performances comparng wth that of 8 threads. Therefore, more-thread parallelsm s not always better whle t requres combnng JavaScrpt task features and dentfyng the maxmum degree of parallelsm under current parallelsm gran sze through experments. 182 Copyrght c 2014 SERSC

5. Conclusons As the seral JavaScrpt executon engne fals to utlze mult-core advantages at present, ths paper proposes a desgn method of the parallel JavaScrpt executon engne. The test results from actual Web applcatons and Sunspder platforms ndcate that n the actual Web applcatons, comparng wth a seral JavaScrpt executon engne wthout JIT acceleraton, the parallel JavaScrpt executon engnes wth 2, 4, 8 and 16 threads can rase the performances respectvely by 37.07%, 1.36 tmes, 4.08 tmes and 5.92 tmes; comparng wth a seral JavaScrpt executon engne wth JIT acceleraton, the parallel JavaScrpt executon engnes wth 2, 4, 8 and 16 threads can rase the performances respectvely by 30.01%, 1.24 tmes, 3.82 tmes and 5.57 tmes. On a Sunspder platform, comparng wth a seral JavaScrpt executon engne wthout JIT acceleraton, the parallel JavaScrpt executon engnes wth 2, 4, 8 and 16 threads can rase the performances respectvely by 58.57%, 2.23 tmes, 4.97 tmes and 9.28 tmes, whle comparng wth a seral JavaScrpt executon engne wth JIT acceleraton, the parallel JavaScrpt executon engnes wth 2, 4, 8 and 16 threads can rase the performances respectvely by -39.7%, 22.86%, 1.27 tmes and 2.91 tmes. Ths algorthm utlzes the parallelsm of functon-level and loop-level gran szes at the same tme durng the executon and sgnfcantly mproves the performances of Web busness wth a lot of loops or functon calls, whle current Web busnesses also contan many nonloop or functon-call codes. It s also possble to execute these codes n parallel, so the subsequent study may hghlght the parallelsm research of byte-code gran szes. eanwhle, the parallel JavaScrpt executon engne may generate addtonal memory consumpton. Therefore, the memory optmzaton wll also be the key for future researches. Acknowledgements Thanks to my tutor, from the topcs of papers, wrtng, changes to the draft, condensed tutor's efforts and wsdom, thank the fund support. References [1] G. Garen, Announcng SqurrelFsh.2008.http://www.webkt.org/blog/189/announcng-squrrelfsh/. [2] Google Inc. A New Crankshaft for V8.2010. http://blog.chromum.org/ 2010/12/new-crankshaft-for-v8.html. [3] ozllawk. JavaScrpt:Traceonkey. 2010. https://wk.mozlla.org/ JavaScrpt:Traceonkey. [4] A. Gal, B. Ech,. Shaver, D. Anderson, D. andeln,. R. Haghghat and. Franz, Trace-based ust-ntme type specalzaton for dynamc languages, In AC Sgplan Notces, vol. 44, no. 6, (2009), June, pp. 465-478, AC. [5] WebKt. SunSpder 1.0.2 JavaScrpt Benchmark. 2013. https://www.webkt.org/perf/sunspder- 1.0.2/sunspder-1.0.2/drver.html. [6] Google. V8 Benchmark Sute.2009. http://v8.googlecode.com/svn/data/benchmarks/v3/ run.html. [7] J. K. artnsen and H. Grahn, A methodology for evaluatng JavaScrpt executon behavor n nteractve web applcatons, In Proc. of the 9th ACS/IEEE Int l Conf. On Computer Systems and Applcatons, (2011), December, pp. 241 248. [8] P. Ratanaworabhan, B. Lvshts and B. G. Zorn, JS eter: Comparng the behavor of JavaScrpt benchmarks wth real web applcatons, In WebApps 10: Proc. of the 2010 USENIX Conf. on Web Applcaton Development, (2010), pp. 3 3. [9] G. Rchards, S. Lebresne, B. Burg and J. Vtek, An analyss of the dynamc behavor of JavaScrpt programs, In PLDI 10: Proc. of the 2010 AC SIGPLAN Conf. on Programmng Language Desgn and Implementaton, (2010), pp. 1 12. [10] artnsen, J. Kasper, H. Grahn and Anders Isberg, A comparatve evaluaton of JavaScrpt executon behavor, Web Engneerng. Sprnger Berln Hedelberg, (2011), pp. 399-402. [11] E. Fortuna, A lmt study of JavaScrpt parallelsm, Workload Characterzaton (IISWC), 2010 IEEE Internatonal Symposum on. IEEE, (2010). Copyrght c 2014 SERSC 183

[12] artnsen, J. Kasper, H. Grahn and A. Isberg, Usng speculaton to enhance avascrpt performance n web applcatons, Internet Computng, IEEE 17.2 (2013), pp. 10-19. [13] artnsen, J. Kasper and H. Grahn, An alternatve optmzaton technque for JavaScrpt engnes, Proceedngs of the Thrd Swedsh Workshop on ult-core Computng (CC-10), (2010). [14] artnsen, J. Kasper and H. Grahn, Thread-level speculaton for web applcatons, Second Swedsh Workshop on ult-core Computng, (2009). [15] Oancea, E. Cosmn, and. Alan, Software thread-level speculaton: an optmstc lbrary mplementaton, Proceedngs of the 1st nternatonal workshop on ultcore software engneerng, AC, (2008). [16] A. Gal, Trace-based ust-n-tme type specalzaton for dynamc languages, AC Sgplan Notces, vol. 44, no. 6, AC, (2009). [17] S. Gupta, SPARK: A hgh-level synthess framework for applyng parallelzng compler transformatons, VLSI Desgn, 2003, Proceedngs, 16th Internatonal Conference on. IEEE, (2003). [18] WebKt, Squrrelfsh bytecode, 2013. http://www.webkt.org/specs/squrrelfsh-bytecode.html. [19] artnsen, J. Kasper, H. Grahn and A. Isberg, Prelmnary Results of Combnng Thread-Level Speculaton and Just-n-Tme Complaton n Google s V8, Sxth Swedsh Workshop on ultcore Computng (CC- 13), Halmstad Unversty, (2013). [20] artnsen, J. Kasper, H. Grahn and A. Isberg, A comparatve evaluaton of JavaScrpt executon behavor, Web Engneerng, Sprnger Berln Hedelberg, (2011), pp. 399-402. 184 Copyrght c 2014 SERSC