Programming Languages and Techniques (CIS120)

Similar documents
Programming Languages and Techniques (CIS120)

Programming Languages and Techniques (CIS120)

Programming Languages and Techniques (CIS120)

Programming Languages and Techniques (CIS120)

Programming Languages and Techniques (CIS120)

CIS 120 Midterm II November 16, Name (printed): Pennkey (login id):

CIS 120 Midterm II November 16, 2012 SOLUTIONS

Programming Languages and Techniques (CIS120)

Fall 2017 Mentoring 9: October 23, Min-Heapify This. Level order, bubbling up. Level order, bubbling down. Reverse level order, bubbling up

Programming Languages and Techniques (CIS120)

Programming Languages and Techniques (CIS120)

Review. CSE 143 Java. A Magical Strategy. Hash Function Example. Want to implement Sets of objects Want fast contains( ), add( )

Programming Languages and Techniques (CIS120)

Programming Languages and Techniques (CIS120)

Programming Languages and Techniques (CIS120)

Programming Languages and Techniques (CIS120)

Implementing Hash and AVL

Hash tables. hashing -- idea collision resolution. hash function Java hashcode() for HashMap and HashSet big-o time bounds applications

Data abstractions: ADTs Invariants, Abstraction function. Lecture 4: OOP, autumn 2003

CS 231 Data Structures and Algorithms, Fall 2016

Programming Languages and Techniques (CIS120)

Introduction to Programming System Design CSCI 455x (4 Units)

Announcements. Container structures so far. IntSet ADT interface. Sets. Today s topic: Hashing (Ch. 10) Next topic: Graphs. Break around 11:45am

INSTRUCTIONS TO CANDIDATES

Programming Languages and Techniques (CIS120)

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

1.00 Lecture 32. Hashing. Reading for next time: Big Java Motivation

Programming Languages and Techniques (CIS120)

Programming Languages and Techniques (CIS120)

Equality for Abstract Data Types

Introduction to Programming Using Java (98-388)

(f) Given what we know about linked lists and arrays, when would we choose to use one data structure over the other?

MIDTERM EXAM THURSDAY MARCH

COE318 Lecture Notes Week 4 (Sept 26, 2011)

Announcements. Today s topic: Hashing (Ch. 10) Next topic: Graphs. Break around 11:45am

CSC 1351: Final. The code compiles, but when it runs it throws a ArrayIndexOutOfBoundsException

Programming Languages and Techniques (CIS120)

Collections, Maps and Generics

EXAMINATIONS 2016 TRIMESTER 2

More Java Basics. class Vector { Object[] myarray;... //insert x in the array void insert(object x) {...} Then we can use Vector to hold any objects.

Programming Languages and Techniques (CIS120)

Programming Languages and Techniques (CIS120)

CMSC 433 Section 0101 Fall 2012 Midterm Exam #1

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

1.00/1.001 Introduction to Computers and Engineering Problem Solving. Final Exam

Chapter 11: Collections and Maps

CSE373 Fall 2013, Second Midterm Examination November 15, 2013

Programming Languages and Techniques (CIS120e)

Topic 22 Hash Tables

Compsci 201 Hashing. Jeff Forbes February 7, /7/18 CompSci 201, Spring 2018, Hashiing

Section 05: Solutions

EXAMINATIONS 2005 END-YEAR. COMP 103 Introduction to Data Structures and Algorithms

Programming Languages and Techniques (CIS120)

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger.

Lecture 7: Efficient Collections via Hashing

Canonical Form. No argument constructor Object Equality String representation Cloning Serialization Hashing. Software Engineering

The Java Type System (continued)

CIS 190: C/C++ Programming. Lecture 12 Student Choice

(b) Draw a hash table having 10 buckets. Label the buckets 0 through 9.

COMP 103 Introduction to Data Structures and Algorithms

CSE 331 Winter 2016 Midterm Solution

CSE 143 Sp03 Final Exam Sample Solution Page 1 of 13

CIS 120 Final Exam 15 December 2017

Announcements. 1. Forms to return today after class:

Java: advanced object-oriented features

Hashing. Reading: L&C 17.1, 17.3 Eck Programming Course CL I

Section 05: Midterm Review

inside: THE MAGAZINE OF USENIX & SAGE August 2003 volume 28 number 4 PROGRAMMING McCluskey: Working with C# Classes

Computer Science II Data Structures

Introduction Data Structures Nick Smallbone

COMP 250. Lecture 27. hashing. Nov. 10, 2017

Al al-bayt University Prince Hussein Bin Abdullah College for Information Technology Computer Science Department

Points off Total off Net Score. CS 314 Final Exam Spring 2017

Programming Languages and Techniques (CIS120)

Programming Languages and Techniques (CIS120)

(b) Draw a hash table having 10 buckets. Label the buckets 0 through 9.

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Programming Languages and Techniques (CIS120)

Lecture 18. Collision Resolution

For this section, we will implement a class with only non-static features, that represents a rectangle

COMP251: Algorithms and Data Structures. Jérôme Waldispühl School of Computer Science McGill University

CS61BL: Data Structures & Programming Methodology Summer 2014

Domain-Driven Design Activity

MARKING KEY The University of British Columbia MARKING KEY Computer Science 260 Midterm #1 Examination 12:30 noon, Tuesday, February 14, 2012

W09 CPSC 219 Midterm Exam Page 1 of 7

Prelim 2. CS 2110, November 20, 2014, 7:30 PM Extra Total Question True/False Short Answer

Lecture 6: Hashing Steven Skiena

COS 226 Algorithms and Data Structures Fall Midterm

CIS 120 Midterm II March 29, 2013 SOLUTIONS

COE318 Lecture Notes Week 3 (Sept 19, 2011)

EXAMINATIONS 2015 COMP103 INTRODUCTION TO DATA STRUCTURES AND ALGORITHMS

Exam 1 Prep. Dr. Demetrios Glinos University of Central Florida. COP3330 Object Oriented Programming

Object Oriented Programming: In this course we began an introduction to programming from an object-oriented approach.

Announcements HEAPS & PRIORITY QUEUES. Abstract vs concrete data structures. Concrete data structures. Abstract data structures

CIT-590 Final Exam. Name: Penn Key (Not ID number): If you write a number above, you will lose 1 point

3. Convert 2E from hexadecimal to decimal. 4. Convert from binary to hexadecimal

Announcements. Submit Prelim 2 conflicts by Thursday night A6 is due Nov 7 (tomorrow!)

Outline. Java Models for variables Types and type checking, type safety Interpretation vs. compilation. Reasoning about code. CSCI 2600 Spring

USAL1J: Java Collections. S. Rosmorduc

Outline. Logistics. Logistics. Principles of Software (CSCI 2600) Spring Logistics csci2600/

Transcription:

Programming Languages and Techniques (CIS120) Lecture 37 April 22, 2016 Encapsulation & Hashing

How is the Game Project going so far? 1. not started 2. got an idea 3. submitted design proposal 4. started coding 5. it's somewhat working 6. it's mostly working 7. debugging / polishing 8. done!

Announcements Monday is the bonus lecture! No clickers required. "Code is Data" Game project due Tuesday at midnight, hard deadline. NO LATE PROJECTS WILL BE ACCEPTED.

Monday, May 9 th, 9-11AM FINAL EXAM Four locations, see webpage for details Comprehensive exam over course concepts: OCaml material (though we won t worry much about syntax) All Java material (emphasizing material since midterm 2) all course content old exams posted Closed book, but: One letter- sized, handwritten sheet of notes allowed Review Session: TBA

public class ResArray { } /** Constructor, takes no arguments. */ public ResArray() { } /** Access position i. If position i has not yet * been initialized, return 0. */ public int get(int idx) { } /** Update index i to contain the value v. */ public void set(int idx, int val) { } /** Return the extent of the array. i.e. one past the index of the last nonzero value in the array. */ public int getextent() { } ResArray a = new ResArray(); a.set(3,2); a.set(4,1); a.set(4,0); int result = a.getextent(); What should be the result? 1. 0 2. 3 3. 4 4. 5 5. ArrayIndexOutOfBoundsException 6. NullPointerException

Resizable Arrays public class ResArray { /** Constructor, takes no arguments. */ public ResArray() { } Object Invariant: extent is always 1 past the last nonzero value in data (or 0 if the array is all zeros) /** Access the array at position i. If position i has not yet * been initialized, return 0. */ public int get(int i) { } /** Modify the array at position i to contain the value v. */ public void set(int i, int v) { } /** Return the extent of the array. */ public int getextent() { } }

ResArray ASM Workspace ResArray x = new ResArray(); x.set(3,2); x.set(4,1); x.set(4,0); Stack Heap

ResArray ASM Workspace ResArray x = new ResArray(); x.set(3,2); x.set(4,1); x.set(4,0); x Stack Heap ResArray data extent 040 int[] length 0

ResArray ASM Workspace ResArray x = new ResArray(); x.set(3,2); x.set(4,1); x.set(4,0); x Stack Heap ResArray data extent 04 int[] length 0 int[] length 4 0 0 0 2

ResArray ASM Workspace ResArray x = new ResArray(); x.set(3,2); x.set(4,1); x.set(4,0); x Stack Heap ResArray data extent 04 int[] length 4 0 0 0 2

ResArray ASM Workspace ResArray x = new ResArray(); x.set(3,2); x.set(4,1); x.set(4,0); Stack Heap int[] x ResArray data length 8 0 0 0 2 1 0 0 0 extent 045 int[] length 4 0 0 0 2

ResArray ASM Workspace ResArray x = new ResArray(); x.set(3,2); x.set(4,1); x.set(4,0); Stack Heap int[] x ResArray data length 8 0 0 0 2 1 0 0 0 extent 045

ResArray ASM Workspace ResArray x = new ResArray(); x.set(3,2); x.set(4,1); x.set(4,0); Stack Heap int[] x ResArray data length 8 0 0 0 2 0 0 0 0 extent 04

Resizable Arrays public class ResArray { /** Constructor, takes no arguments. */ public ResArray() { } Object Invariant: extent is always 1 past the last nonzero value in data (or 0 if the array is all zeros) /** Access the array at position i. If position i has not yet * been initialized, return 0. */ public int get(int i) { } /** Modify the array at position i to contain the value v. */ public void set(int i, int v) { } /** Return the extent of the array. */ public int getextent() { } } /** An array containing all nonzero values */ public int[] values() { }

Values Method public class ResArray { private int[] data; private int extent = 0; Object Invariant: extent is always 1 past the last nonzero value in data (or 0 if the array is all zeros) } /** An array containing all nonzero values */ public int[] values() { return data; } Is this a good implementation of "values"?

ResArray ASM Workspace ResArray x = new ResArray(); x.set(3,2); int[] y = x.values(); y[3] = 0; x Stack Heap ResArray data extent 04 int[] length 4 0 0 0 2

ResArray ASM Workspace ResArray x = new ResArray(); x.set(3,2); int[] y = x.values(); y[3] = 0; x Stack Heap ResArray data extent 04 int[] length 4 0 0 0 2 y

ResArray ASM Workspace ResArray x = new ResArray(); x.set(3,2); int[] y = x.values(); y[3] = 0; x y Stack Heap ResArray data extent 04 int[] length 4 0 0 0 0 invariant violation!

Values method and encapsulation public int[] values() { int[] values = new int[extent]; for (int i=0; i<extent; i++) { values[i] = data[i]; } return values; } Use encapsulation to preserve invariants about the state of the object. All modification to the state of the object must be done using the object's own methods. Enforce encapsulation by not returning aliases to mutable data structures from methods.

Hash Sets & Hash Maps array- based implementation of sets and maps

Hash Sets and Maps: The Big Idea Combine: the advantage of arrays: efficient random access to its elements with the advantage of a map datastructure arbitrary keys (not just integer indices) How? Create an index into an array by hashing the data in the key to turn it into an int Java s hashcode method maps key data to ints Generally, the space of keys is much larger than the space of hashes, so, unlike array indices, hashcodes might not be unique

Hash Maps, Pictorially Keys hashcode Array Values John Doe 000 null 001 CSCI Jimmy Bob 002 null 003 CBE Jane Smith 253 DMD Joan Jones 254 255 null WUNG A schematic HashMap taking Strings (student names) to Undergraduate Majors. Here, John Doe.hashCode() returns an integer n, its hash, such that n mod 256 is 254.

Hash Collisions Uh Oh: Indices derived via hashing may not be unique! Jane Smith.hashCode() % 256 è 253 Joe Schmoe.hashCode() % 256 è 253 Good hashcode functions make it unlikely that two keys will produce the same hash But, it can happen that two keys do produce the same index that is, their hashes collide

Bucketing and Collisions Keys hashcode Array Buckets of Bindings John Doe 000 null 001 Jimmy Bob CSCI Jimmy Bob 002 null 003 Joan Jones CBE Jane Smith 253 Jane Smith Joe Shmoe DMD MATH Joan Jones 254 255 null John Doe WUNG Joe Schmoe Here, Jane Smith.hashCode() and Joe Schmoe.hashCode() happen to collide. The bucket at the corresponding index of the Hash Map array stores the map data.

Bucketing and Collisions Using an array of buckets Each bucket stores the mappings for keys that have the same hash. Each bucket is itself a map from keys to values (implemented by a linked list or binary search tree). The buckets can t use hashing to index the values instead they use key equality (via the key s equals method) To lookup a key in the Hash Map: First, find the right bucket by indexing the array through the key s hash Second, search through the bucket to find the value associated with the key Not the only solution to the collision problem

Hashing and User- defined Classes public class Point { private final int x; private final int y; public Point(int x, int y) { this.x = x; this.y = y; } public int getx() { return x; } public int gety() { return y; } } // somewhere in main Map<Point,String> m = new HashMap<Point,String>(); m.put(new Point(1,2), "House"); System.out.println(m.containsKey(new Point(1,2))); What gets printed to the console? 1. true 2. false 3. I have no idea

HashCode Requirements Whenever you override equals you must also override hashcode in a consistent way: whenever o1.equals(o2)== true you must ensure that o1.hashcode() == o2.hashcode() Why? Because comparing hashes is supposed to be a quick approximation for equality. Note: the converse does not have to hold: o1.hashcode() == o2.hashcode() does not necessarily mean that o1.equals(o2)

Example for Point public class Point { @Override public int hashcode() { final int prime = 31; int result = 1; result = prime * result + x; result = prime * result + y; return result; } } Examples: (new Point(1,2)).hashCode() yields 994 (new Point(2,1)).hashCode() yields 1024 Note that equal points have the same hashcode Why 31? Prime chosen to create more uniform distribution Note: eclipse can generate this code

Computing Hashes What is a good recipe for computing hash values for your own classes? intuition: smear the data throughout all the bits of the resulting integer 1. Start with some constant, arbitrary, non- zero int in result. 2. For each significant field f of the class (i.e. each field taken into account when computing equals), compute a sub hash code c for the field: For boolean fields: (f? 1 : 0) For byte, char, int, short: (int) f For long: (int) (f ^ (f >>> 32)) For references: 0 if the reference is null, otherwise use the hashcode() of the field. 3. Accumulate those subhashes into the result by doing (for each field s c): result = prime * result + c; 4. return result

Hash Map Performance Hash Maps can be used to efficiently implement Maps and Sets There are many different strategies for dealing with hash collisions with various time/space tradeoffs Real implementations also dynamically rescale the size of the array (which might require re- computing the bucket contents) If the hashcode function gives a good (close to uniform) distribution of hashes the buckets are expected to be small (only one or two elements) Performance depends on workload

Terminological Clash The word "hash" is also used in cryptography SHA- 1, SHA- 2, SHA- 3, MD5, etc. Cryptographic hashes are intended to reduce large byte sequences to short byte sequences Very hard to invert Should only rarely have collisions Are considerably more expensive to compute than hashcode (so not suitable for hash tables) Never use hashcode when you need a cryptographic hash! See CIS 331 for more details

Collections: take away lessons equals hashcode compareto

Collections Requirements All collections use equals Defaults to == (reference equality) Override equals to create structural equality Should be: false for distinct instance classes An equivalence relation: reflexive, symmetric, transitive HashSets/HashMaps use hashcode Override when equals is overridden Should be compatible with equals Should try to "distribute" the values uniformly Iterator not guaranteed to follow element order Ordered collections (TreeSet, TreeMap) need to implement Comparable<Object> Override compareto Should implement a total order Strongly recommended to be compatible with equals (i.e. o1.equals(o2) exactly when o1.compareto(o2) == 0)

Comparing Collection Performance HashTest.java