TECHNICAL WHITEPAPER. Performance Evaluation Java Collections Framework. Performance Evaluation Java Collections. Technical Whitepaper.

Similar documents
COM1020/COM6101: Further Java Programming

Java Collections Framework reloaded

Collections. Powered by Pentalog. by Vlad Costel Ungureanu for Learn Stuff

The class Object. Lecture CS1122 Summer 2008

C12a: The Object Superclass and Selected Methods

Domain-Driven Design Activity

CS2110: Software Development Methods. Maps and Sets in Java

Implementation. Learn how to implement the List interface Understand the efficiency trade-offs between the ArrayList and LinkedList implementations

Java Persistence API (JPA) Entities

USAL1J: Java Collections. S. Rosmorduc

Review. CSE 143 Java. A Magical Strategy. Hash Function Example. Want to implement Sets of objects Want fast contains( ), add( )

CONTAİNERS COLLECTİONS

CMSC131. Inheritance. Object. When we talked about Object, I mentioned that all Java classes are "built" on top of that.

CS2110: Software Development Methods. Maps and Sets in Java

NAME: c. (true or false) The median is always stored at the root of a binary search tree.

MIT AITI Lecture 18 Collections - Part 1

Wentworth Institute of Technology COMP1050 Computer Science II Spring 2017 Derbinsky. Collections & Maps. Lecture 12. Collections & Maps

Java Classes. Produced by. Introduction to the Java Programming Language. Eamonn de Leastar

CSC 1351: Final. The code compiles, but when it runs it throws a ArrayIndexOutOfBoundsException

Programmieren II. Polymorphism. Alexander Fraser. June 4, (Based on material from T. Bögel)

PVS. Empirical Study of Usage and Performance of Java Collections. Diego Costa, 1 Artur Andrzejak, 1 Janos Seboek, 2 David Lo

Collections class Comparable and Comparator. Slides by Mark Hancock (adapted from notes by Craig Schock)

Lecture 15 Summary. Collections Framework. Collections class Comparable and Comparator. Iterable, Collections List, Set Map

Building Java Programs

Collections Questions

Java Collections. Readings and References. Collections Framework. Java 2 Collections. CSE 403, Spring 2004 Software Engineering

CS61BL Summer 2013 Midterm 2

Core Java Contents. Duration: 25 Hours (1 Month)

Today. Book-keeping. Exceptions. Subscribe to sipb-iap-java-students. Collections. Play with problem set 1

The Collections API. Lecture Objectives. The Collections API. Mark Allen Weiss

CS11 Java. Winter Lecture 8

EXAMINATIONS 2016 TRIMESTER 2

DOWNLOAD PDF CORE JAVA APTITUDE QUESTIONS AND ANSWERS

Java Classes, Inheritance, and Interfaces

Topic #9: Collections. Readings and References. Collections. Collection Interface. Java Collections CSE142 A-1

17. Java Collections. Organizing Data. Generic List in Java: java.util.list. Type Parameters ( Parameteric Polymorphism ) Data Structures that we know

Fall 2017 Mentoring 9: October 23, Min-Heapify This. Level order, bubbling up. Level order, bubbling down. Reverse level order, bubbling up

Distributed Systems Recitation 1. Tamim Jabban

Core Java Syllabus. Overview

Introduction to Computer Science I

Collections (Java) Collections Framework

Collections, Maps and Generics

Regular Expressions ("reguläre Ausdrücke", "regexp")

EXAMINATIONS 2015 COMP103 INTRODUCTION TO DATA STRUCTURES AND ALGORITHMS

Distributed Systems Recitation 1. Tamim Jabban

5/23/2015. Core Java Syllabus. VikRam ShaRma

Samples of Evidence to Satisfy the AP Computer Science AB Curricular Requirements

CSE 143 Au03 Final Exam Page 1 of 15

Creating an Immutable Class. Based on slides by Prof. Burton Ma

Java Collections Framework

CS 310: Maps and Sets and Trees

Today's Agenda. > To give a practical introduction to data structures. > To look specifically at Lists, Sets, and Maps

COMP 250. Lecture 32. interfaces. (Comparable, Iterable & Iterator) Nov. 22/23, 2017

Java Collections. Readings and References. Collections Framework. Java 2 Collections. References. CSE 403, Winter 2003 Software Engineering

Java Data Structures Collections Framework BY ASIF AHMED CSI-211 (OBJECT ORIENTED PROGRAMMING)

Core Java - SCJP. Q2Technologies, Rajajinagar. Course content

Class Libraries. Readings and References. Java fundamentals. Java class libraries and data structures. Reading. Other References

Interfaces and Collections

PIC 20A Number, Autoboxing, and Unboxing

Java Collections Framework. 24 April 2013 OSU CSE 1

CS 310: Maps and Sets

JAVA SYLLABUS FOR 6 WEEKS

27/04/2012. Objectives. Collection. Collections Framework. "Collection" Interface. Collection algorithm. Legacy collection

Algorithms. Produced by. Eamonn de Leastar

Linked Lists. References and objects

13 A: External Algorithms II; Disjoint Sets; Java API Support

Building Java Programs

Computational Expression

Polymorphism. return a.doublevalue() + b.doublevalue();

Announcements. Container structures so far. IntSet ADT interface. Sets. Today s topic: Hashing (Ch. 10) Next topic: Graphs. Break around 11:45am

Inheritance. Notes Chapter 6 and AJ Chapters 7 and 8

CS 3410 Ch 20 Hash Tables

From Java Code to Java Heap Understanding the Memory Usage of Your Application

Some examples and/or figures were borrowed (with permission) from slides prepared by Prof. H. Roumani. The Collection Framework

The Object Class. java.lang.object. Important Methods In Object. Mark Allen Weiss Copyright 2000

CMSC131. Library Classes

Computational Applications in Nuclear Astrophysics using Java Java course Lecture 6

Sets and Maps. Part of the Collections Framework

Hash table basics. ate à. à à mod à 83

Practical Session 3 Java Collections

EXAMINATIONS 2012 MID YEAR. COMP103 Introduction to Data Structures and Algorithms SOLUTIONS

Collections Framework: Part 2

JAVA. Duration: 2 Months

Building Java Programs

1.00/1.001 Introduction to Computers and Engineering Problem Solving. Final Exam

AP Computer Science 4325

Pieter van den Hombergh Richard van den Ham. February 8, 2018

Canonical Form. No argument constructor Object Equality String representation Cloning Serialization Hashing. Software Engineering

Recap. List Types. List Functionality. ListIterator. Adapter Design Pattern. Department of Computer Science 1

Depth-wise Hashing with Deep Hashing Structures. A two dimensional representation of a Deep Table

Inheritance (Part 5) Odds and ends

Lecture 15 Summary 3/11/2009. By the end of this lecture, you will be able to use different types of Collections and Maps in your Java code.

EXAMINATIONS 2011 Trimester 2, MID-TERM TEST. COMP103 Introduction to Data Structures and Algorithms SOLUTIONS

Prelim 2. CS 2110, November 20, 2014, 7:30 PM Extra Total Question True/False Short Answer

Model Solutions. COMP 103: Mid-term Test. 21st of August, 2014

Hash tables. hashing -- idea collision resolution. hash function Java hashcode() for HashMap and HashSet big-o time bounds applications

11-1. Collections. CSE 143 Java. Java 2 Collection Interfaces. Goals for Next Several Lectures

Painless Persistence. Some guidelines for creating persistent Java applications that work

Java Programming Unit 8. Selected Java Collec5ons. Generics.

Implementation. (Mapping to Java) Jörg Kienzle & Alfred Strohmeier. COMP-533 Implementation

Transcription:

Performance Evaluation Java Collections Framework TECHNICAL WHITEPAPER Author: Kapil Viren Ahuja Date: October 17, 2008

Table of Contents 1 Introduction...3 1.1 Scope of this document...3 1.2 Intended audience...3 2 Evaluation Approach...4 2.1 Comparison parameters...4 2.2 Comparison scenarios...5 2.3 Environment...6 2.4 Execution and sampling...7 3 Measurements...9 3.1 Insertion of unique elements (Long)...9 3.2 Comparison of unique elements (Element)...10 3.3 Comparison of non-unique elements (Element)...12 3.4 Iteration over elements (Long)...12 3.5 Iteration over elements (Element)...14 Appendices A Data Structure for custom class...16 B List of Tables...18 C Change Log...19 Page 2

Introduction 1 INTRODUCTION Managing list or collection of objects is a very common scenario. In addition, managing that list effectively, that provides the optimum performance is also a very common need. The Java programming language offers many in-built data types for representing and modeling collection of objects. Some of the commonly used data types are: java.lang.arraylist java.lang.hashset java.lang.treemap Each of the data types behave differently under different scenarios. In addition, when writing algorithms that demonstrate highest levels of performance it is necessary to make the right choice. For many developers and architects it is not an easy choice. This document provides details of a comparison done across various data-types supported by Java Collections Framework. In addition, it will study their performance under different circumstances. 1.1 Scope of this document This document provides performance data for various data types in Java Collections Framework. It will not provide details of the Collections Framework or about the data. This is a factor of how each collection data type is implemented in Java and hence is subject to change from one implementation of the Java Virtual Machine specification to another. If developers are interested in learning the reasons behind performance are encouraged to read the Java documentation on Sun Microsystems website. This document does not contain any recommendations. This document only covers performance results in a specific environment. How you interpret the results and use these is entirely up to you. 1.2 Intended audience All Java developers who are using or intend to use the Java Collections Framework while developing an application want to decide which collection data type to use in a given scenario. Page 3

Evaluation Approach 2 EVALUATION APPROACH To benchmark the performance we have to establish some common rules, which can be consistently applied to various scenarios. These are listed below: 1. Comparison parameters 2. Comparison scenarios 3. Environment 4. Execution and sampling 2.1 Comparison parameters For the success of any benchmark, it is critical that various parameters are identified upfront. This helps in a consistent comparison. We had selected four different parameters for our comparison. These are explained below: Collection size The very first parameter used in the benchmarking process is the size of the collection itself. The number of elements contained in a collection identifies the size of a collection. Performance was benchmarked for varied sizes of 1000 to 100,000 in multiples of 10. We did not consider size less than 10000, as results for different data types were similar. In addition, we did not consider size of 1,000,000 because we were running into Java heap size issues. Collection type The second parameter used in the benchmarking process is the data type of the Java Collections Framework. These are listed below: 1. ArrayList 2. LinkedList 3. HashSet 4. TreeSet 5. Vector 6. HashMap 7. TreeMap 8. LinkedHashMap Page 4

If developers are interested to understand the Java Collections Framework, they are encouraged to read more the following provided links: Wikipedia IBM Data type of the elements stored Another parameter used in the benchmarking process is the data type of the element stored in the collection. Data types used were primitive, in-built and user-defined. The intention of using all three kinds of data types is to provide coverage across all kinds of data types. These are listed below: 1. In-built data type: java.lang.long 2. User-defined data type: We created a custom class called "Element". We created instance of this class with random data during the exercise. The structure has been defined in Annexure Data Structure for custom class Sample size The fourth and last parameter used in the benchmarking process is the sample size. It is a very common practice to repeat a process several times and collect data points. This ensures that we have tested for consistency of the behavior. Using this data a correlation can easily be drawn on the data set. We performed 10 iterations for every scenario. 2.2 Comparison scenarios To benchmark the performance, we identified a few but very commonly used scenarios. These have been explained below: Insertion One of the very basic requirements of a collection is to insert an element or number of elements into a collection. This scenario deals with the common use case of inserting elements in a collection. We evaluated two aspects of the scenarios: 1. In the first scenario, we inserted unique elements in a collection. We used the value returned by the hashcode to identify the uniqueness of an element. This was tested for elements of data type Long as two objects return different values 2. In the second scenario, we inserted non-unique elements in a collection. For creating non-unique elements, the hashcode method of the element Page 5

class was overridden to return the same value always. This case was tried only on data types of Set and Map because only these two types support filtering out non-unique elements. Iteration Another very common use case is to iterate over a collection. We observed in most cases, iteration over a collection is more frequently used scenario when compared to insertion and deletion of elements. 2.3 Environment Results of any performance benchmark are dependent on the environment on which the data is being deduced. For the purposes of this benchmark, the system specifications have been listed below: Hardware specifications Processor Value Intel Core 2 Duo CPU T8100 Number of CPUs 2 CPU speed RAM model and make Both cores @ 2.10 GHZ 3070 MB Table 1: Hardware specifications Software specifications Operating System Java runtime Value Windows Vista Home Premium Java 2 Runtime Environment, Standard Edition (build 1.5.0_08-b03) IDE Eclipse 3.3.2 Build id: M20080221-1800 Table 2: Software specifications Page 6

2.4 Execution and sampling During the benchmarking exercise all, the scenarios were run as per the parameters agreed upon. We had available with us two approaches to record samples for the benchmark. These have been listed below: Iterations for scenarios In this approach, we considered one scenario as one sample. We then iterated over the same scenario 10 times and collected samples. For example, we inserted 10 records in an ArrayList and recorded one sample. This sample was the time taken to insert 10 records in the collection. We repeated the process 10 times. Elements in a collection In this approach, we considered one element as one sample. For example, we inserted 10 records in an ArrayList and recorded a sample which was the time taken to insert that element in the collection. At the end of the use case, we had 10 samples. For the purposes of this evaluation, we opted for the former approach, because in most common cases, a user is interested in performance of one complete operation. The later approach is not so much useful, because it will not provide diversified samples. Hence, measuring the predictability of the collection is not feasible. In addition, collecting samples for iterations will ensure that any variation during while adding elements to collection are captured. Interpretation of results Benchmark was prepared using various mathematical parameters. These have been listed below: S. No Description Symbol Unit 1 Iterations The total number of times a specific scenario was performed n N.A. 2 Minimum time Minimum amount of time taken to complete an iteration 3 Maximum time Maximum amount of time taken to complete an iteration min µs max µs 4 Total time Total time taken to complete all iterations T µs Page 7

5 time Average time taken to complete all iterations M µs 6 Standard Deviation Standard deviation of the operation from mean σ µs 7 Number of samples that are outside 1 sigma m±σ N.A. 8 Number of samples that are outside 2 sigma m±2σ N.A. 9 Number of samples that are outside 3 sigma m±3σ N.A. Table 3: Mathematical parameters for benchmark To compare a collection for a given scenario, the following two factors should be looked at collectively: 1. time: This represents the average time consumed to perform an iteration of a scenario. It is calculated as average of the times taken for all the iterations 2. outside sigma: Standard Deviation is the factor that determined the stability of a distribution. It has been proven that a distribution is said to be most stable if it follows a Normal distribution. As per the laws of the normal distribution if the number of samples outside the lower and upper control limits of mean and standard deviation are less, the distribution is said to be more stable. When comparing two or more data types, we should look for a data type that is the fastest in a given scenario. However, if the faster data type is less stable then we cannot predict the same performance every time. This will mean that in a real scenario there is a higher probability of the data type to run slower of faster than expected. However, a more stable data type, which is a little slow in execution, is a better option. Page 8

Measurements 3 MEASUREMENTS 3.1 Insertion of unique elements (Long) Comparison for data size of 10000 elements (Long) ArrayList 0.8 2 1 1 LinkedList 0.8 12 1 0 HashSet 1.28 1 1 1 TreeSet 6.8 2 1 1 Vector 1.2 1 1 1 HashMap 1.48 5 1 1 TreeMap 6.36 5 1 1 LinkedHashMap 1.88 1 1 1 Table 4: Results for insertion of 10000 unique elements (Long) Comparison for data size of 100000 elements (Long) ArrayList 25.6 4 1 0 LinkedList 54.96 1 1 7 HashSet 63.12 2 1 1 TreeSet 126.08 2 2 2 Vector 20.04 5 3 0 HashMap 42.68 4 3 0 Page 9

TreeMap 99.12 7 1 0 LinkedHashMap 59.12 8 2 0 Table 5: Results for insertion of 100000 unique elements (Long) Comparison of data size of 1000000 elements (Long) ArrayList 165.44 1 1 1 LinkedList 378.52 1 1 1 HashSet 675.24 3 1 1 TreeSet 1253.16 6 2 0 Vector 231 1 1 1 HashMap 711.52 1 1 1 TreeMap 1215.25 10 1 0 LinkedHashMap 863.48 2 1 1 Table 6: Results for insertion of 100000 unique elements (Long) 3.2 Comparison of unique elements (Element) Comparison for data size of 10000 elements ArrayList 3.12 4 1 1 LinkedList 1.28 1 1 1 HashSet 1.28 2 2 2 Vector 1.28 2 2 2 Page 10

HashMap 1.24 1 1 1 LinkedHashMap 1.92 2 2 1 Table 7: Results for insertion of 10000 unique elements (Element) Comparison for data size of 100000 elements ArrayList 51.76 1 1 1 LinkedList 45.52 7 0 0 HashSet 52.44 3 2 1 Vector 20.6 4 2 1 HashMap 1.28 2 2 2 LinkedHashMap 78.04 9 1 0 Table 8: Results for insertion of 100000 unique elements (Element) Comparison for data size of 500000 elements ArrayList 188.48 1 1 1 LinkedList 298.44 2 1 1 HashSet 444.2 1 1 1 Vector 169.08 11 0 0 HashMap 437.4 1 1 1 LinkedHashMap 530.44 1 1 1 Page 11

Table 9: Results for insertion of 500000 unique elements (Element) 3.3 Comparison of non-unique elements (Element) Comparison for data size of 10000 elements HashSet 1218.08 12 0 0 HashMap 1233.08 4 2 0 LinkedHashMap 1234.96 9 0 0 Table 10: Results for insertion of 10000 non-unique elements (Element) You can notice that time taken for inserting 10000 non-unique elements, is significantly more than for unique elements. This clearly shows that such cases should be avoided unless necessary. We did not carry on any further benchmarking of this scenario due to our observation above. 3.4 Iteration over elements (Long) Comparison for data size of 10000 elements (Long) ArrayList 0.64 1 1 1 LinkedList 0 0 0 0 HashSet 0.64 1 1 1 TreeSet 0 O O O Page 12

Vector 0 0 0 0 HashMap 1.84 3 3 0 TreeMap 5.64 9 0 0 LinkedHashMap 2.56 4 4 0 Table 11: Results for iteration of 10000 elements (Long) Comparison for data size of 100000 elements (Long) ArrayList 7.52 1 1 0 LinkedList 8 1 1 1 HashSet 17 1 1 1 TreeSet 9.96 11 1 1 Vector 4.4 7 0 0 HashMap 28.76 11 1 0 TreeMap 97.96 6 1 0 LinkedHashMap 71.12 10 0 0 Table 12: Results for iteration of 100000 elements (Long) Comparison for data size of 1000000 elements (Long) ArrayList 33.8 2 2 1 LinkedList 33.08 8 1 1 HashSet 53.64 2 2 0 Page 13

TreeSet 35.12 5 3 0 Vector 29.4 7 2 0 HashMap 683.16 1 1 1 TreeMap 1219.92 4 1 1 LinkedHashMap 844.96 1 1 1 Table 13: Results for iteration of 1000000 elements (Long) 3.5 Iteration over elements (Element) Comparison for data size of 10000 elements (Element) ArrayList 2.48 4 4 0 LinkedList 1.28 1 1 1 HashSet 1.28 1 1 1 Vector 1.28 2 2 2 HashMap 1.28 1 1 1 LinkedHashMap 1.88 3 3 0 Table 14: Results for iteration of 10000 elements (Element) Comparison for data size of 100000 elements (Element) ArrayList 51.16 1 1 1 LinkedList 54.92 19 0 0 HashSet 54.32 3 3 1 Page 14

Vector 19.26 4 1 1 HashMap 51,84 8 3 0 LinkedHashMap 80.96 7 0 0 Table 15: Results for iteration of 100000 elements (Element) Comparison for data size of 1000000 elements (Element) ArrayList 187.28 1 1 1 LinkedList 325.12 1 1 1 HashSet 470.44 5 1 1 Vector 177.24 15 0 0 HashMap 448.4 5 1 1 LinkedHashMap 548.44 1 1 1 Table 16: Results for iteration of 1000000 elements (Element) Page 15

Appendix A DATA STRUCTURE FOR CUSTOM CLASS package com.kapil.spikes.collections; public class Element private Long identifier; public Element(Long identifier) this.identifier = identifier; @Override public int hashcode() final int prime = 31; int result = 1; result = prime * result + ((identifier == null)? 0 : identifier.hashcode()); return result; // Returing a constant value of 1 will make all objects equal // return 1; @Override public boolean equals(object obj) if (this == obj) return true; if (obj == null) return false; if (getclass()!= obj.getclass()) return false; final Element other = (Element) obj; if (identifier == null) if (other.identifier!= null) return false; else if (!identifier.equals(other.identifier)) return false; Page 16

return true; Page 17

List of Tables B LIST OF TABLES Table 1: Hardware specifications...6 Table 2: Software specifications...6 Table 3: Mathematical parameters for benchmark...8 Table 4: Results for insertion of 10000 unique elements (Long)...9 Table 5: Results for insertion of 100000 unique elements (Long)...10 Table 6: Results for insertion of 100000 unique elements (Long)...10 Table 7: Results for insertion of 10000 unique elements (Element)...11 Table 8: Results for insertion of 100000 unique elements (Element)...11 Table 9: Results for insertion of 500000 unique elements (Element)...12 Table 10: Results for insertion of 10000 non-unique elements (Element)...12 Table 11: Results for iteration of 10000 elements (Long)...13 Table 12: Results for iteration of 100000 elements (Long)...13 Table 13: Results for iteration of 1000000 elements (Long)...14 Table 14: Results for iteration of 10000 elements (Element)...14 Table 15: Results for iteration of 100000 elements (Element)...15 Table 16: Results for iteration of 1000000 elements (Element)...15 Page 18

Appendix C CHANGE LOG ID Description User Date 1 First Draft of the benchmark Kapil Viren Ahuja 2008-10-08 2 Published Kapil Viren Ahuja 2008-10-21 Page 19