GrinderBench. software benchmark data book.

Similar documents
GrinderBench for the Java Platform Micro Edition Java ME

Software Development & Education Center. Java Platform, Micro Edition. (Mobile Java)

Unit-2 Divide and conquer 2016

Modern Programming Languages. Lecture LISP Programming Language An Introduction

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

[CHAPTER] 1 INTRODUCTION 1

Big Java Late Objects

Andrew Shitov. Using Perl Programming Challenges Solved with the Perl 6 Programming Language

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Topics. Hardware and Software. Introduction. Main Memory. The CPU 9/21/2014. Introduction to Computers and Programming

For searching and sorting algorithms, this is particularly dependent on the number of data elements.

PennBench: A Benchmark Suite for Embedded Java

Matrix Multiplication

CS 261 Data Structures. Big-Oh Analysis: A Review

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

STEPHEN WOLFRAM MATHEMATICADO. Fourth Edition WOLFRAM MEDIA CAMBRIDGE UNIVERSITY PRESS

Computer Science Spring 2005 Final Examination, May 12, 2005

Program Optimization

OPERATING SYSTEMS. Prescribed Text Book Operating System Principles, Seventh Edition By Abraham Silberschatz, Peter Baer Galvin and Greg Gagne

Benchmark hardware support for virtual machines

CSCI-580 Advanced High Performance Computing

ITEC 350: Introduction To Computer Networking Midterm Exam #2 Key. Fall 2008

About this exam review

Project #5: Hubble Simulator

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)

Introduction to Programming: Variables and Objects. HORT Lecture 7 Instructor: Kranthi Varala

Chapter 04. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

m:m--r-^k% II I I IIIIHII Annotated Archives Art Friedman, Lars Klander, Mark Michaelis, and Herb Schildt

Agenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)

On my honor I affirm that I have neither given nor received inappropriate aid in the completion of this exercise.

Introduction p. 1 Pseudocode p. 2 Algorithm Header p. 2 Purpose, Conditions, and Return p. 3 Statement Numbers p. 4 Variables p. 4 Algorithm Analysis

Index. Anagrams, 212 Arrays.sort in O(nlogn) time, 202

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

C++ Programming: From Problem Analysis to Program Design, Fifth Edition. Chapter 6: User-Defined Functions I

JavaScript Specialist v2.0 Exam 1D0-735

개발과정에서의 MATLAB 과 C 의연동 ( 영상처리분야 )

Who am I? Wireless Online Game Development for Mobile Device. What games can you make after this course? Are you take the right course?

Sorting. Chapter 12. Objectives. Upon completion you will be able to:

Java How to Program, 9/e. Copyright by Pearson Education, Inc. All Rights Reserved.

H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon. School of Electrical Engineering and Computer Science Seoul National University, Korea

Searching & Sorting in Java Shell Sort

Web Robots Platform. Web Robots Chrome Extension. Web Robots Portal. Web Robots Cloud

An Introduction to Python (TEJ3M & TEJ4M)

CS193k, Stanford Handout #17. Advanced

Hashing. Hashing Procedures

Multiple Choice Questions. Chapter 5

Introduction 13. Feedback Downloading the sample files Problem resolution Typographical Conventions Used In This Book...

Fractal Image Compression. Kyle Patel EENG510 Image Processing Final project

PROBLEM SOLVING AND PYTHON PROGRAMMING

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.

Writing Functions. Reading: Dawson, Chapter 6. What is a function? Function declaration and parameter passing Return values Objects as parameters

Sorting and Searching -- Introduction

JavaScript CS 4640 Programming Languages for Web Applications

PROGRAMMING IN VISUAL BASIC WITH MICROSOFT VISUAL STUDIO Course: 10550A; Duration: 5 Days; Instructor-led

Algorithms In C++ By Robert Sedgewick READ ONLINE

CISC 1100: Structures of Computer Science

Implementing bit-intensive programs in StreamIt/StreamBit 1. Writing Bit manipulations using the StreamIt language with StreamBit Compiler

Do you remember any iterative sorting methods? Can we produce a good sorting method by. We try to divide and conquer: break into subproblems

Perl Scripting. Students Will Learn. Course Description. Duration: 4 Days. Price: $2295

ELE 455/555 Computer System Engineering. Section 1 Review and Foundations Class 5 Computer System Performance

What is an algorithm? CISC 1100/1400 Structures of Comp. Sci./Discrete Structures Chapter 8 Algorithms. Applications of algorithms

Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology

Organisation. Assessment

The inverse of a matrix

Table of Contents. Chapter 1: Introduction to Data Structures... 1

Whitenoise Laboratories Inc.

Matrix Multiplication

Cryptography. Cryptography is much more than. What is Cryptography, exactly? Why Cryptography? (cont d) Straight encoding and decoding

Three General Principles of QA. COMP 4004 Fall Notes Adapted from Dr. A. Williams

1 Definition of Reduction

Data Structures in C++ Using the Standard Template Library

Overview of Java 2 Platform, Micro Edition (J2ME )

Acknowledgments Introduction p. 1 The Wireless Internet Revolution p. 1 Why Java Technology for Wireless Devices? p. 2 A Bit of History p.

Accelerated Library Framework for Hybrid-x86

COL862 Programming Assignment-1

Chapter 11 Introduction to Programming in C

Odds and Ends. (!) Wed night dinner, Monroe 5:30 Acknowledge assistance as appropriate

Lecture 7. Transform-and-Conquer

CS 161 Fall 2015 Final Exam

Time : 3 hours. Full Marks : 75. Own words as far as practicable. The questions are of equal value. Answer any five questions.

MapReduce Design Patterns

Parallel Algorithms for the Third Extension of the Sieve of Eratosthenes. Todd A. Whittaker Ohio State University

Part (04) Introduction to Programming

DATA ABSTRACTION AND PROBLEM SOLVING WITH JAVA

IBM Cell Processor. Gilbert Hendry Mark Kretschmann

Colin Turfus, Symbian Developer Network. Developer essentials for Symbian OS

Overview. Rationale Division of labour between script and C++ Choice of language(s) Interfacing to C++ Performance, memory

Test 1 Last 4 Digits of Mav ID # Multiple Choice. Write your answer to the LEFT of each problem. 2 points each t 1

Functionally Modular. Self-Review Questions

D. Θ nlogn ( ) D. Ο. ). Which of the following is not necessarily true? . Which of the following cannot be shown as an improvement? D.

Sorting. CSE 143 Java. Insert for a Sorted List. Insertion Sort. Insertion Sort As A Card Game Operation. CSE143 Au

Copyright 2013 Thomas W. Doeppner. IX 1

13 File Structures. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Advanced Programming & C++ Language

CSE 141 Summer 2016 Homework 2

«Computer Science» Requirements for applicants by Innopolis University

CSE 127: Computer Security Cryptography. Kirill Levchenko

Workload Characterization Techniques

JavaScript CS 4640 Programming Languages for Web Applications

Transcription:

GrinderBench software benchmark data book

Table of Contents Calculating the Grindermark...2 Chess...3 Crypto...5 kxml...6 Parallel...7 PNG...9 1

Name: Calculating the Grindermark The Grindermark and the GrinderBench Java Suite GrinderMark Score Computation Using Geometric Mean versus Arithmetic Mean EEMBC GrinderBench is a suite of five individual benchmark applications executed in the context of a benchmark framework. Each individual benchmark application is designed to mimic a complex real-world application and perform computations and operate on data relevant to that particular application scenario. Each benchmark application computes a single score at completion. Therefore, one complete execution of all benchmark applications will yield five individual scores. These detailed scores offer the highest value to system designers, allowing comparison of the individual applications that are specific to their designs. To simplify comparisons and enhance the presentation of comparative data on Java platform performance, a single-number score called Grindermark can be computed in addition to scores of the individual benchmark applications. Grindermark numbers are intended to provide a first-order representation of Java platform performance. Because EEMBC GrinderBench targets the broadest range of embedded Java platforms (including memory-limited CLDC 1.0 platforms) it was not possible to include the computation of the Grindermark score into the benchmark suite itself. There are two options for computing the Grindermark score once all five individual scores have been obtained: Use the online Grindermark score calculator at www.grinderbench.com/howto.html Compute the Grindermark score by taking the geometric mean of the five individual benchmark application scores: Grindermark score = geomean(chess score, Crypto score, kxml score, Parallel score, PNG score) Either of these options will result in a correct Grindermark score. For a comparison of scores see www.grinderbench.com/benchmarks EEMBC uses a geometric mean to calculate Grindermark to assure equal weighting for all five benchmarks in the GrinderBench suite. This is because the typical iterations per second achieved in each benchmark kernel tend to vary widely from kernel to kernel. If an arithmetic mean were used, kernels that tend to yield a relatively small number of iterations per second would have virtually no influence on the singlenumber score. In effect, an arithmetic mean of results would impose an arbitrary weighting system that heavily favors the tests with the most iterations per second. The use of the geometric mean avoids this problem. 2

Name: Chess Highlights Performs weighted tree searches Performs variable depth searches to prevent repetition No file I/O. Plays three games of 10 moves each Method, logic and array intensive Low use of native methods Application Chess is a game with a predefined set of rules. It has 32 pieces on a board of 64 squares. For the electronic or computer version of chess, there are a variety of methods available for determining the best move. Each piece has a possible set of moves, ranging from zero upwards. These moves are programmed in by search algorithms within the class representing each type of chess piece. The program can then map all possible moves, in all possible directions, against pieces that are in the way. Collision detection also has another role -- weighting the decision toward opponent s pieces to capture them. There are also the positions that present an immediate danger of being captured on (weighting against that move). Each piece type or class also has an array that represents favored positions on the board, which is also used as a weighting. Electronic chess games have to think ahead. There s no use moving a piece if, after the next move, it or another piece can be taken. So the program can iterate through speculative moves and build up a tree of actual moves with the possible moves to determine a better idea of whether a particular move is a good one or not. Thus the board can be scanned in 64 steps for all a player s 16 pieces (assuming they are all still there!), and each of a piece s possible moves can be analyzed for the impact it makes. All these possible moves (favored or all) can be taken for further analysis at further depth. This property of weighted problem solving with logic might explain the popularity of chess games for computers, with advanced games being able to play out many thousands of potential scenarios at every point in the game. It is also the reason why it makes a good test of the machine it is running on. The chess benchmark only performs the logical parts of a chess program, as no graphical output is available. It plays a preset number of games with itself and times how long it takes. Analysis of Computing For the chess benchmark, the code runs to completion by directly timing the code execution without utilizing other timing variations. Therefore, lower 3

Resources times are better. Overall, the chess benchmark is simple and should be executable in very small Java enabled devices. It plays the machine by playing the preset number of moves, using the chess algorithms for the black and white pieces alternately. Special Notes It is possible to change the behavior by using the command line arguments - -debug or - -boards, and by supplying two numbers: <games> and <moves> 4

Name: Crypto Highlights Uses the Bouncy Castle Crypto package based on the MIT X Consortium's work, and is a clean room implementation of the JCE API Contains multiple encrypt/decrypt engines Encrypts and decrypts a 4kbyte text string using the System.currentTimeMillis() method to time the execution. DES, DESede, IDEA, Blowfish, Twofish Application CryptoBench contains multiple encrypt/decrypt engines. A 4kbyte text string is encrypted, and then decrypted using the System.currentTimeMillis() method to time the execution. The following encryption algorithms are exercised: DES, DESede, IDEA, Blowfish, Twofish. The first argument is the key resource name, and the second argument is the text to encrypt and then decrypt. After fetching the key and the data, it runs, in sucession, the encryption followed by the decryption for each of the algorithm types. The order of engine calling is as follows: DES encrypt DES decrypt DESede encrypt DESede decrypt IDEA encrypt IDEA decrypt Blowfish encrypt Blowfish decrypt Twofish encrypt Twofish decrypt If the answer at the very end does not match what is expected, an error is printed. As with all EEMBC GrinderBench Java benchmarks, the entire benchmark is run five (5) times. Analysis of Computing Resources Special Notes This is a mathematically intensive benchmark that uses integer math only. Detailed specifications can be found at: http://www.bouncycastle.org/specifications.html. Each cryptographic engine must be executed to get the full score. 5

Name: kxml Highlights Utilizes the kxml XML parsing package Tests XML parsing, DOM tree manipulation Very flexible, execution is controlled by command scripts Application The kxml benchmark measures XML parsing and/or DOM tree manipulation. The actual parsing and manipulation is done by the kxml package, which is available under a modified open-source license. Details are available at http://www.kxml.org. This package is designed for use in small-footprint environments. The benchmark takes as input a command script. The script may contain any sensible combination of the following commands: Parse an XML document and store it as a DOM tree representation Parse an XML document and insert it into an existing DOM tree at the specified node Search a DOM tree (already in memory) for a particular element name Search a DOM tree (already in memory) for a particular text string Create a DOM tree with empty nodes Delete DOM trees A hash table of DOM trees is maintained by the benchmark, so that each command may refer to and make use of the results of previous commands. The kxml benchmark processes a command script which specifies XML documents to parse and DOM tree manipulations to do. Analysis of Computing Resources For the kxml benchmark, the code runs to completion by directly timing the code execution without running multiple iterations or utilizing other timing variations. Therefore, lower times are better. Overall, the kxml benchmark is simple and should be executable in very small Java enabled devices. It utilizes the kxml package which should provide it with a more generic stability. Special Notes Although resource streams are used to access the input data (command script and XML document(s)), these are read into ByteArrayInputStreams before the actual timing begins. This is done to focus the benchmark on the computational aspects of XML parsing and DOM tree manipulation, rather than the implementation s access to static resources. 6

Name: Parallel Highlights the performance of thread switching and synchronization in a Java virtual machine Two parallel algorithms are executed separately Each thread executes simple mathematical and array sorting operations Application The ParallelBench benchmark tests the performance of KVM threading capabilities. It accomplishes this by dividing computational tasks among several threads that must then cooperate with each other to complete those tasks. Two parallel algorithms are used. First, the benchmark executes a mergesort algorithm. The mergesort algorithm sorts a list by having each thread sort a subset of the list, then merging the sublists. The second algorithm is a parallel matrix multiplication algorithm. This algorithm multiplies two matrices by having each thread work on a block of values. The ability to quickly switch threads is a key component of MIDP applications. Most MIDP applications work by having a minimum of two threads running concurrently. One thread drives or updates the state of the application while another processes user interface events. To create a good user experience, the virtual machine must quickly switch between the event-handling thread and the thread that maintains an application state. The ParallelBench benchmark completes tests that run a mergesort algorithm and a matrix multiplication algorithm using 2, 4, 8, and 16 thread counts. The ParallelBench benchmark first executes a mergesort algorithm. The mergesort algorithm begins by dividing an unsorted list into P equal length sublists (where P is the number of threads being used). The algorithm then starts the worker threads by sending messages to the message queue. Each message contains information about the array and the portion to sort. The worker threads sort each sublist using the bubblesort algorithm. After the threads complete the sorting of their respective lists, P/2 threads merge the sublists together. The merging of sublists in separate threads is repeated until all the sublists are merged into a single list. After completing mergesort tests, ParallelBench tests thread processing by using a different algorithm. The parallel matrix multiplication algorithm 7

multiplies two 40 x 40 matrices. The matrix multiplication algorithm that is used is: Definitions: p - the total number of threads P(m) - thread with the unique id m n - the dimension of matrices a[0...(n-1)][0...(n-1)] - the first matrix b[0...(n-1)][0...(n-1)] - the second matrix c[0...(n-1)][0...(n-1)] - the product matrix Algorithm: for all P(m) where 1 <= m <= p do for i = m to n step p do for j = 1 to n do t = 0 for k = 1 to n do t = t + a[i][k] * b[k][j] endfor c[i][j] = t endfor endfor endfor Analysis of Computing Resources Special Notes The ParallelBench benchmark requires a virtual machine to perform thread switching, basic math, and array indexing operations. Since threading is usually handled outside of the Java interpreter, ParallelBench tests the performance of a key subsystem of a virtual machine that cannot be optimized by running the code through a compiler or optimized interpreter loop. The ParallelBench benchmark contains eight tests. Each test is run three times. The score for each is the average of the three runs. Focus is on a steady-state. All eight tests must be run to obtain an EEMBC ParallelBench Score. 8

Name: PNG Highlights Decodes PNG (Portable Network Graphics) images PNG images are very common in J2ME applications Application Analysis of Computing Resources The Png benchmark measures the time it takes to decode a PNG image, the standard format for image representation in J2ME implementations. This benchmark has the capability to decode multi-segmented PNG images, which are quite common. Because CLDC lacks APIs for graphics, this benchmark does not display any of the decoded image, however, it does provide an ASCII representation of the decoded image in the verification output. The Png benchmark does the decoding of a PNG image, including decompression, and stores the result internally as header info, color palette(s), and image data. The benchmark is computationally intensive and also does a significant amount of data copying. The code runs to completion by directly timing the code execution without running multiple iterations or utilizing other timing variations. Therefore, lower times are better. Special Notes The largest of the XML input files for this benchmark is 19kbytes and it contains over 200 XML tags. Although resource streams are used to access the image file, it is read into a ByteArrayInputStreams before the actual timing begins. This is done to focus the benchmark on the computational aspects of Png, rather than the implementation s access to static resources. Information about the PNG format may be found at http://www.libpng.org/pub/png. The image decoded by this benchmark is a 128x128 bit indexed grayscale image. 9