GrinderBench. software benchmark data book.

GrinderBench software benchmark data book

Table of Contents Calculating the Grindermark...2 Chess...3 Crypto...5 kxml...6 Parallel...7 PNG...9 1

Name: Calculating the Grindermark The Grindermark and the GrinderBench Java Suite GrinderMark Score Computation Using Geometric Mean versus Arithmetic Mean EEMBC GrinderBench is a suite of five individual benchmark applications executed in the context of a benchmark framework. Each individual benchmark application is designed to mimic a complex real-world application and perform computations and operate on data relevant to that particular application scenario. Each benchmark application computes a single score at completion. Therefore, one complete execution of all benchmark applications will yield five individual scores. These detailed scores offer the highest value to system designers, allowing comparison of the individual applications that are specific to their designs. To simplify comparisons and enhance the presentation of comparative data on Java platform performance, a single-number score called Grindermark can be computed in addition to scores of the individual benchmark applications. Grindermark numbers are intended to provide a first-order representation of Java platform performance. Because EEMBC GrinderBench targets the broadest range of embedded Java platforms (including memory-limited CLDC 1.0 platforms) it was not possible to include the computation of the Grindermark score into the benchmark suite itself. There are two options for computing the Grindermark score once all five individual scores have been obtained: Use the online Grindermark score calculator at www.grinderbench.com/howto.html Compute the Grindermark score by taking the geometric mean of the five individual benchmark application scores: Grindermark score = geomean(chess score, Crypto score, kxml score, Parallel score, PNG score) Either of these options will result in a correct Grindermark score. For a comparison of scores see www.grinderbench.com/benchmarks EEMBC uses a geometric mean to calculate Grindermark to assure equal weighting for all five benchmarks in the GrinderBench suite. This is because the typical iterations per second achieved in each benchmark kernel tend to vary widely from kernel to kernel. If an arithmetic mean were used, kernels that tend to yield a relatively small number of iterations per second would have virtually no influence on the singlenumber score. In effect, an arithmetic mean of results would impose an arbitrary weighting system that heavily favors the tests with the most iterations per second. The use of the geometric mean avoids this problem. 2

Name: Chess Highlights Performs weighted tree searches Performs variable depth searches to prevent repetition No file I/O. Plays three games of 10 moves each Method, logic and array intensive Low use of native methods Application Chess is a game with a predefined set of rules. It has 32 pieces on a board of 64 squares. For the electronic or computer version of chess, there are a variety of methods available for determining the best move. Each piece has a possible set of moves, ranging from zero upwards. These moves are programmed in by search algorithms within the class representing each type of chess piece. The program can then map all possible moves, in all possible directions, against pieces that are in the way. Collision detection also has another role -- weighting the decision toward opponent s pieces to capture them. There are also the positions that present an immediate danger of being captured on (weighting against that move). Each piece type or class also has an array that represents favored positions on the board, which is also used as a weighting. Electronic chess games have to think ahead. There s no use moving a piece if, after the next move, it or another piece can be taken. So the program can iterate through speculative moves and build up a tree of actual moves with the possible moves to determine a better idea of whether a particular move is a good one or not. Thus the board can be scanned in 64 steps for all a player s 16 pieces (assuming they are all still there!), and each of a piece s possible moves can be analyzed for the impact it makes. All these possible moves (favored or all) can be taken for further analysis at further depth. This property of weighted problem solving with logic might explain the popularity of chess games for computers, with advanced games being able to play out many thousands of potential scenarios at every point in the game. It is also the reason why it makes a good test of the machine it is running on. The chess benchmark only performs the logical parts of a chess program, as no graphical output is available. It plays a preset number of games with itself and times how long it takes. Analysis of Computing For the chess benchmark, the code runs to completion by directly timing the code execution without utilizing other timing variations. Therefore, lower 3

Resources times are better. Overall, the chess benchmark is simple and should be executable in very small Java enabled devices. It plays the machine by playing the preset number of moves, using the chess algorithms for the black and white pieces alternately. Special Notes It is possible to change the behavior by using the command line arguments - -debug or - -boards, and by supplying two numbers: <games> and <moves> 4

Name: Crypto Highlights Uses the Bouncy Castle Crypto package based on the MIT X Consortium's work, and is a clean room implementation of the JCE API Contains multiple encrypt/decrypt engines Encrypts and decrypts a 4kbyte text string using the System.currentTimeMillis() method to time the execution. DES, DESede, IDEA, Blowfish, Twofish Application CryptoBench contains multiple encrypt/decrypt engines. A 4kbyte text string is encrypted, and then decrypted using the System.currentTimeMillis() method to time the execution. The following encryption algorithms are exercised: DES, DESede, IDEA, Blowfish, Twofish. The first argument is the key resource name, and the second argument is the text to encrypt and then decrypt. After fetching the key and the data, it runs, in sucession, the encryption followed by the decryption for each of the algorithm types. The order of engine calling is as follows: DES encrypt DES decrypt DESede encrypt DESede decrypt IDEA encrypt IDEA decrypt Blowfish encrypt Blowfish decrypt Twofish encrypt Twofish decrypt If the answer at the very end does not match what is expected, an error is printed. As with all EEMBC GrinderBench Java benchmarks, the entire benchmark is run five (5) times. Analysis of Computing Resources Special Notes This is a mathematically intensive benchmark that uses integer math only. Detailed specifications can be found at: http://www.bouncycastle.org/specifications.html. Each cryptographic engine must be executed to get the full score. 5

Name: kxml Highlights Utilizes the kxml XML parsing package Tests XML parsing, DOM tree manipulation Very flexible, execution is controlled by command scripts Application The kxml benchmark measures XML parsing and/or DOM tree manipulation. The actual parsing and manipulation is done by the kxml package, which is available under a modified open-source license. Details are available at http://www.kxml.org. This package is designed for use in small-footprint environments. The benchmark takes as input a command script. The script may contain any sensible combination of the following commands: Parse an XML document and store it as a DOM tree representation Parse an XML document and insert it into an existing DOM tree at the specified node Search a DOM tree (already in memory) for a particular element name Search a DOM tree (already in memory) for a particular text string Create a DOM tree with empty nodes Delete DOM trees A hash table of DOM trees is maintained by the benchmark, so that each command may refer to and make use of the results of previous commands. The kxml benchmark processes a command script which specifies XML documents to parse and DOM tree manipulations to do. Analysis of Computing Resources For the kxml benchmark, the code runs to completion by directly timing the code execution without running multiple iterations or utilizing other timing variations. Therefore, lower times are better. Overall, the kxml benchmark is simple and should be executable in very small Java enabled devices. It utilizes the kxml package which should provide it with a more generic stability. Special Notes Although resource streams are used to access the input data (command script and XML document(s)), these are read into ByteArrayInputStreams before the actual timing begins. This is done to focus the benchmark on the computational aspects of XML parsing and DOM tree manipulation, rather than the implementation s access to static resources. 6

Name: Parallel Highlights the performance of thread switching and synchronization in a Java virtual machine Two parallel algorithms are executed separately Each thread executes simple mathematical and array sorting operations Application The ParallelBench benchmark tests the performance of KVM threading capabilities. It accomplishes this by dividing computational tasks among several threads that must then cooperate with each other to complete those tasks. Two parallel algorithms are used. First, the benchmark executes a mergesort algorithm. The mergesort algorithm sorts a list by having each thread sort a subset of the list, then merging the sublists. The second algorithm is a parallel matrix multiplication algorithm. This algorithm multiplies two matrices by having each thread work on a block of values. The ability to quickly switch threads is a key component of MIDP applications. Most MIDP applications work by having a minimum of two threads running concurrently. One thread drives or updates the state of the application while another processes user interface events. To create a good user experience, the virtual machine must quickly switch between the event-handling thread and the thread that maintains an application state. The ParallelBench benchmark completes tests that run a mergesort algorithm and a matrix multiplication algorithm using 2, 4, 8, and 16 thread counts. The ParallelBench benchmark first executes a mergesort algorithm. The mergesort algorithm begins by dividing an unsorted list into P equal length sublists (where P is the number of threads being used). The algorithm then starts the worker threads by sending messages to the message queue. Each message contains information about the array and the portion to sort. The worker threads sort each sublist using the bubblesort algorithm. After the threads complete the sorting of their respective lists, P/2 threads merge the sublists together. The merging of sublists in separate threads is repeated until all the sublists are merged into a single list. After completing mergesort tests, ParallelBench tests thread processing by using a different algorithm. The parallel matrix multiplication algorithm 7

multiplies two 40 x 40 matrices. The matrix multiplication algorithm that is used is: Definitions: p - the total number of threads P(m) - thread with the unique id m n - the dimension of matrices a[0...(n-1)][0...(n-1)] - the first matrix b[0...(n-1)][0...(n-1)] - the second matrix c[0...(n-1)][0...(n-1)] - the product matrix Algorithm: for all P(m) where 1 <= m <= p do for i = m to n step p do for j = 1 to n do t = 0 for k = 1 to n do t = t + a[i][k] * b[k][j] endfor c[i][j] = t endfor endfor endfor Analysis of Computing Resources Special Notes The ParallelBench benchmark requires a virtual machine to perform thread switching, basic math, and array indexing operations. Since threading is usually handled outside of the Java interpreter, ParallelBench tests the performance of a key subsystem of a virtual machine that cannot be optimized by running the code through a compiler or optimized interpreter loop. The ParallelBench benchmark contains eight tests. Each test is run three times. The score for each is the average of the three runs. Focus is on a steady-state. All eight tests must be run to obtain an EEMBC ParallelBench Score. 8

Name: PNG Highlights Decodes PNG (Portable Network Graphics) images PNG images are very common in J2ME applications Application Analysis of Computing Resources The Png benchmark measures the time it takes to decode a PNG image, the standard format for image representation in J2ME implementations. This benchmark has the capability to decode multi-segmented PNG images, which are quite common. Because CLDC lacks APIs for graphics, this benchmark does not display any of the decoded image, however, it does provide an ASCII representation of the decoded image in the verification output. The Png benchmark does the decoding of a PNG image, including decompression, and stores the result internally as header info, color palette(s), and image data. The benchmark is computationally intensive and also does a significant amount of data copying. The code runs to completion by directly timing the code execution without running multiple iterations or utilizing other timing variations. Therefore, lower times are better. Special Notes The largest of the XML input files for this benchmark is 19kbytes and it contains over 200 XML tags. Although resource streams are used to access the image file, it is read into a ByteArrayInputStreams before the actual timing begins. This is done to focus the benchmark on the computational aspects of Png, rather than the implementation s access to static resources. Information about the PNG format may be found at http://www.libpng.org/pub/png. The image decoded by this benchmark is a 128x128 bit indexed grayscale image. 9