COM1020/COM6101: Further Java Programming

Similar documents
TECHNICAL WHITEPAPER. Performance Evaluation Java Collections Framework. Performance Evaluation Java Collections. Technical Whitepaper.

CMSC 132: Object-Oriented Programming II. Hash Tables

Java Collections Framework reloaded

The class Object. Lecture CS1122 Summer 2008

Domain-Driven Design Activity

Regular Expressions ("reguläre Ausdrücke", "regexp")

Review. CSE 143 Java. A Magical Strategy. Hash Function Example. Want to implement Sets of objects Want fast contains( ), add( )

Compsci 201 Hashing. Jeff Forbes February 7, /7/18 CompSci 201, Spring 2018, Hashiing

CS11 Java. Winter Lecture 8

Java Collections. Readings and References. Collections Framework. Java 2 Collections. CSE 403, Spring 2004 Software Engineering

CS61B Lecture #24: Hashing. Last modified: Wed Oct 19 14:35: CS61B: Lecture #24 1

Prelim 2. CS 2110, 16 November 2017, 7:30 PM Total Question Name Short Heaps Tree Collections Sorting Graph

CSE 143. Lecture 28: Hashing

1.00/1.001 Introduction to Computers and Engineering Problem Solving. Final Exam

Announcements. Container structures so far. IntSet ADT interface. Sets. Today s topic: Hashing (Ch. 10) Next topic: Graphs. Break around 11:45am

NAME: c. (true or false) The median is always stored at the root of a binary search tree.

Announcements. Today s topic: Hashing (Ch. 10) Next topic: Graphs. Break around 11:45am

MIT AITI Lecture 18 Collections - Part 1

Prelim 2 SOLUTION. CS 2110, 16 November 2017, 7:30 PM Total Question Name Short Heaps Tree Collections Sorting Graph

Topic #9: Collections. Readings and References. Collections. Collection Interface. Java Collections CSE142 A-1

Collections, Maps and Generics

Java Persistence API (JPA) Entities

Building Java Programs

Hashing as a Dictionary Implementation

Java Collections. Readings and References. Collections Framework. Java 2 Collections. References. CSE 403, Winter 2003 Software Engineering

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Habanero Extreme Scale Software Research Project

27/04/2012. Objectives. Collection. Collections Framework. "Collection" Interface. Collection algorithm. Legacy collection

SUMMARY INTRODUCTION COLLECTIONS FRAMEWORK. Introduction Collections and iterators Linked list Array list Hash set Tree set Maps Collections framework

Announcements. Submit Prelim 2 conflicts by Thursday night A6 is due Nov 7 (tomorrow!)

Not overriding equals

Advanced Programming - JAVA Lecture 4 OOP Concepts in JAVA PART II

Implementation. Learn how to implement the List interface Understand the efficiency trade-offs between the ArrayList and LinkedList implementations

Class Libraries. Readings and References. Java fundamentals. Java class libraries and data structures. Reading. Other References

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

CS2110: Software Development Methods. Maps and Sets in Java

USAL1J: Java Collections. S. Rosmorduc

java.lang.object: Equality

Hash tables. hashing -- idea collision resolution. hash function Java hashcode() for HashMap and HashSet big-o time bounds applications

Outline. 1 Hashing. 2 Separate-Chaining Symbol Table 2 / 13

Java Collections. Wrapper classes. Wrapper classes

Java Collections. Engi Hafez Seliem

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

CS 3410 Ch 20 Hash Tables

Total Score /1 /20 /41 /15 /23 Grader

EXAMINATIONS 2016 TRIMESTER 2

11-1. Collections. CSE 143 Java. Java 2 Collection Interfaces. Goals for Next Several Lectures

CS 310: Hash Table Collision Resolution

Hashing. Reading: L&C 17.1, 17.3 Eck Programming Course CL I

Abstract Data Types (ADTs) Queues & Priority Queues. Sets. Dictionaries. Stacks 6/15/2011

CS 455 Midterm Exam 2 Fall 2016 [Bono] November 8, 2016

The Object Class. java.lang.object. Important Methods In Object. Mark Allen Weiss Copyright 2000

CS 310: Hashing Basics and Hash Functions

Collections class Comparable and Comparator. Slides by Mark Hancock (adapted from notes by Craig Schock)

Lecture 15 Summary. Collections Framework. Collections class Comparable and Comparator. Iterable, Collections List, Set Map

Fundamental Java Methods

Object-Oriented Programming with Java

Collections. Powered by Pentalog. by Vlad Costel Ungureanu for Learn Stuff

Standard ADTs. Lecture 19 CS2110 Summer 2009

Topic 22 Hash Tables

CS 310: Maps and Sets and Trees

CMPS 290G Topics in Software Engineering Winter 2004 Software Validation and Defect Detection Homework 5

EXAMINATIONS 2015 COMP103 INTRODUCTION TO DATA STRUCTURES AND ALGORITHMS

Announcements. Hash Functions. Hash Functions 4/17/18 HASHING

CS 310: Maps and Sets

Class 32: The Java Collections Framework

Introduction to Hashing

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

CS2110: Software Development Methods. Maps and Sets in Java

CS 310 Advanced Data Structures and Algorithms

CS61B Fall 2015 Guerrilla Section 3 Worksheet. 8 November 2015

CONTAİNERS COLLECTİONS

STANDARD ADTS Lecture 17 CS2110 Spring 2013

Software 1 with Java. Recitation No. 6 (Collections)

CIT-590 Final Exam. Name: Penn Key (Not ID number): If you write a number above, you will lose 1 point

COMP 250. Lecture 26. maps. Nov. 8/9, 2017

Comparing Objects 3/28/16

COMP 250. Lecture 32. interfaces. (Comparable, Iterable & Iterator) Nov. 22/23, 2017

EXAMINATIONS 2017 TRIMESTER 2

(f) Given what we know about linked lists and arrays, when would we choose to use one data structure over the other?

CS 307 Final Spring 2010

Today's Agenda. > To give a practical introduction to data structures. > To look specifically at Lists, Sets, and Maps

Chapter 11: Collections and Maps

Collections Questions

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 6. The Master Theorem. Some More Examples...

CSC 1214: Object-Oriented Programming

C12a: The Object Superclass and Selected Methods

Canonical Form. No argument constructor Object Equality String representation Cloning Serialization Hashing. Software Engineering

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

[Ref: Core Java Chp 13, Intro to Java Programming [Liang] Chp 22, Absolute Java Chp 16, docs.oracle.com/javase/tutorial/collections/toc.

Advanced Topics on Classes and Objects

Sets and Maps. Part of the Collections Framework

Comparing Objects 3/14/16

1B1b Implementing Data Structures Lists, Hash Tables and Trees

CSIS 10B Assignment 15 Maps and Hashing Due: next Wed

Prelim 2 Solution. CS 2110, November 19, 2015, 5:30 PM Total. Sorting Invariants Max Score Grader

Prelim 2. CS 2110, November 20, 2014, 7:30 PM Extra Total Question True/False Short Answer

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 7. (A subset of) the Collections Interface. The Java Collections Interfaces

CS2210 Data Structures and Algorithms

The dictionary problem

Transcription:

(1/19) COM1020/COM6101: Further Java Programming AKA: Object-Oriented Programming, Advanced Java Programming http://www.dcs.shef.ac.uk/ sjr/com1020/ Lecture 7: Collections Accessed by Content Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

(2/19) Reading Horstmann and Cornell, Core Java 2 vol. 2, chapter 2 Sun Java Tutorial: Java Collections Framework (http://www.javasoft.com/docs/books/tutorial/collections/) JavaWorld: Get started with the Java Collections Framework (http://www.javaworld.com/javaworld/jw-11-1998/jw-11- collections.html) Java Solutions: The Java 2 Collections (http://www.cuj.com/java/articles/a1.html)

(3/19) Objectives This lecture will consider collections in which you primarily access elements by their content: eg, finding words in a dictionary Lecture Content: Hash Tables and Hashing HashSet and the Set interface Accessing by content with sorting for free: TreeSet The Comparable interface

(4/19) A Dictionary Consider this problem: Given a text, create a set of all the words used in that text. For example, the (short) text: the element in the collection in the program contains the set {the, element, in, collection, program To do this we need to maintain the set of words that have been observed. For each new word, we need to check if we have already seen it, and if not add it to the set. (To avoid thinking about I/O, tokenization, etc. assume that the input text is stored as a list of tokenized strings.)

(5/19) A Solution The Collection interface has the appropriate method, contains. Using an ArrayList implementation we have: Collection inputtext; Collection wordset = new ArrayList(); String wd; for(iterator i = inputtext.iterator; i.hasnext();){ wd = (String) i.next(); if(wordset.contains(wd) == false) wordset.add(wd); This is not efficient. ArrayList.contains must step through the ArrayList, applying method equals to each element returning when either: 1. A matching element is found 2. The end of the list is reached. LinkedList would have similar (in)efficiency.

(6/19) Hash Tables A hash table is a data structure for access by content. An integer hash code is computed quickly for each object. For a string, the hash code would be some function of the characters that make up the string. A hash table is an array of linked lists, each list is called a bucket To find the index of an object (ob j) in a table of N buckets: index = hash(ob j)%n For example the default hash table has 101 buckets; "Java".hashCode() = 2301506 So "Java" would have index 2301506%101 = 19 Insertion: Once the index is computed, look in that bucket: Bucket is empty, so insert object into bucket Hash Collision: bucket has elements, must compare with all bucket elements to see if object is already present

(7/19) data next data next data next data next data next data next data next

(8/19) Controlling Hash Tables Bucket Count How many buckets are needed? There is a lot of theory on this; these two reasonable rules of thumb reduce the probability of collisions without wasting too much space: 1.5 as many buckets as expected number of elements Make the bucket count a prime number Rehashing When a hash table gets too full, a new table (with more buckets) is created, all elements inserted into the new table and the old table is discarded. Load Factor The load factor determines when the table is too full. Eg, when the load factor is 0.75, the table is rehashed when it is over 75% full HashSet class The Java Collections Framework supplies a class called HashSet which does all this for you. It will provide default values for the initial capacity (101) and load factor (0.75)

(9/19) Using HashSet Collection inputtext; Set wordset = new HashSet(); String wd; for(iterator i = inputtext.iterator; i.hasnext();){ wd = (String) i.next(); // no need for wordset.contains(wd) wordset.add(wd); HashSet implements the Set interface Set.add adds the specified element to this set if it is not already present HashSet.contains is more efficient than ArrayList.contains! If we know we are likely to have about 20 000 elements in the HashSet we can set the initial capacity accordingly: // set capacity to the smallest prime > 1.5*20,000 Set wordset = new HashSet(30011);

(10/19) Hashing Your Own Objects wordset.add(wd) works by calling the String.hashCode method to work out where to insert the item in the hash table String.hashCode obtains its value by computing a function of the characters in the string hashcode is defined in Object (although - as usual - the default definition is not very useful. Always define a hashcode method for objects you will insert in a hash table hashcode should be a function that combines the (hash codes of) the relevant identifiers for the object. equals must also be redefined, so that the hash table knows how to tell if an object has already been inserted in the table These definitions must be compatible: If x.equals(y) then x.hashcode() == y.hashcode()

(11/19) Hashing Example (1) public class Person { public int hashcode() { return 13*surname.hashCode() + 17*firstName.hashCode(); public boolean equals(object obj) { if(obj == null getclass()!= obj.getclass()) return false; Person per = (Person)obj; return surname.equals(per.surname) && firstname.equals(per.firstname); private String surname; private String firstname;

(12/19) Hashing Example (2) public class AutoPart { public int hashcode() { return 13*partID; public boolean equals(object obj) { if(obj == null getclass()!= obj.getclass()) return false; AutoPart part = (AutoPart)obj; return partid == part.partid; // unique part number private int partid;

(13/19) TreeSet TreeSet also implements the Set interface, and is collection of objects that may be accessed by content TreeSet is a sorted collection unlike HashSet: Elements may be inserted in any order; Iterating through the collection returns them in sorted order TreeSet sortedwords = new TreeSet(); sortedwords.add("amazing"); sortedwords.add("allowable"); sortedwords.add("antimatter"); sortedwords.add("abacus"); sortedwords.add("aardvark"); for(iterator iter = sortedwords.iterator(); iter.hasnext();) System.out.println(iter.next());

(14/19) TreeSet Implementation The sorting is accomplished by storing things in a tree structure (typically a red-black tree) Every time an element is added to the tree it is placed in position to that the collection remains sorted Adding is slower than adding to a hash table, but much faster than adding to the right place in an array or linked list For a TreeSet of size n, add (or contains) each require about log 2 (n) comparisons Collection inputtext; Set wordset = new HashSet(); String wd; for(iterator i = inputtext.iterator; i.hasnext();){ wd = (String) i.next(); // no need for wordset.contains(wd) wordset.add(wd);

(15/19) Timing Comparisons The input text was the complete text of an issue of The Guardian newspaper, from April 1995: 119 604 words; 23 727 unique words. Collection Running Time/s ArrayList 87.4 Vector 92.8 LinkedList 121.5 HashSet 0.2 TreeSet 0.4

(16/19) The Comparable Interface Objects to be stored in TreeSet must implement the Comparable interface The Comparable interface defines a single method: int compareto(object obj) a.compareto(b) returns: 0 if a and b are equal; Negative integer if a comes before b; Positive integer if a comes after b. String implements Comparable. String.compareTo returns the strings in lexicographic (alphabetic) order To insert your own objects you must implement Comparable

(17/19) AutoPart.compareTo and Person.compareTo public class AutoPart implements Comparable { public int compareto(object obj) { return partid ((AutoPart)obj).partID; public class Person implements Comparable { public int compareto(object obj){ Person per = (Person)obj; if (surname.equals(per.surname)) return firstname.compareto(per.firstname); return surname.compareto(per.surname);

(18/19) Summary If you need to access elements of a collection by content, then HashSet and TreeSet are appropriate HashSet computes an index for an element based on its hash code TreeSet stores elements in sorted order in a tree Elements stored in a TreeSet should implement the Comparable interface Elements stored in a HashSet should define hashcode and equals appropriately All this is put together in the complete implementations of Person and AutoPart (on the web pages) Next Lecture: Maps and More

(19/19) Exercises Modify the ContactDetails class, so that ContactDetails objects may be stored in a HashSet or a TreeSet Write a test program that creates several ContactDetails objects and stores them in a HashSet Now store the ContactDetails objects in a tree set and prints them out in sorted order.