UNIT 6A Organizing Data: Lists. Last Two Weeks

Similar documents
UNIT 6A Organizing Data: Lists Principles of Computing, Carnegie Mellon University

UNIT 6B Organizing Data: Hash Tables. Comparing Algorithms

UNIT 6B Organizing Data: Hash Tables. Announcements

UNIT 6B Organizing Data: Hash Tables. Comparing Algorithms

Revision Statement while return growth rate asymptotic notation complexity Compare algorithms Linear search Binary search Preconditions: sorted,

INSTITUTE OF AERONAUTICAL ENGINEERING

12 Abstract Data Types

15-110: Principles of Computing, Spring Problem Set 6 (PS6) Due: Friday, March 2 by 2:30PM via Gradescope Hand-in

End-Term Examination Second Semester [MCA] MAY-JUNE 2006

ECE250: Algorithms and Data Structures Midterm Review

15110 PRINCIPLES OF COMPUTING SAMPLE EXAM 2

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Stacks, Queues and Hierarchical Collections

Summer Final Exam Review Session August 5, 2009

1 P a g e A r y a n C o l l e g e \ B S c _ I T \ C \

CPSC 211, Sections : Data Structures and Implementations, Honors Final Exam May 4, 2001

1. Two main measures for the efficiency of an algorithm are a. Processor and memory b. Complexity and capacity c. Time and space d.

CMSC Introduction to Algorithms Spring 2012 Lecture 7

Course Review. Cpt S 223 Fall 2009

! Determine if a number is odd or even. ! Determine if a number/character is in a range. - 1 to 10 (inclusive) - between a and z (inclusive)

Data Structure. A way to store and organize data in order to support efficient insertions, queries, searches, updates, and deletions.

EC8393FUNDAMENTALS OF DATA STRUCTURES IN C Unit 3

l Determine if a number is odd or even l Determine if a number/character is in a range - 1 to 10 (inclusive) - between a and z (inclusive)

CSE030 Fall 2012 Final Exam Friday, December 14, PM

CS251-SE1. Midterm 2. Tuesday 11/1 8:00pm 9:00pm. There are 16 multiple-choice questions and 6 essay questions.

MCA SEM-II Data Structure

III Data Structures. Dynamic sets

! Determine if a number is odd or even. ! Determine if a number/character is in a range. - 1 to 10 (inclusive) - between a and z (inclusive)

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

ASSIGNMENTS. Progra m Outcom e. Chapter Q. No. Outcom e (CO) I 1 If f(n) = Θ(g(n)) and g(n)= Θ(h(n)), then proof that h(n) = Θ(f(n))

Stacks Fall 2018 Margaret Reid-Miller

Introduction p. 1 Pseudocode p. 2 Algorithm Header p. 2 Purpose, Conditions, and Return p. 3 Statement Numbers p. 4 Variables p. 4 Algorithm Analysis

Course Review for Finals. Cpt S 223 Fall 2008

Name CPTR246 Spring '17 (100 total points) Exam 3

Stating the obvious, people and computers do not speak the same language.

Stacks, Queues and Hierarchical Collections. 2501ICT Logan

CS301 - Data Structures Glossary By

CS 8391 DATA STRUCTURES

MULTIMEDIA COLLEGE JALAN GURNEY KIRI KUALA LUMPUR

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

Stacks. Revised based on textbook author s notes.

BRONX COMMUNITY COLLEGE of the City University of New York DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE. Sample Final Exam

Overview of Data. 1. Array 2. Linked List 3. Stack 4. Queue

DATA STRUCTURE : A MCQ QUESTION SET Code : RBMCQ0305

Data Structures Question Bank Multiple Choice

The Stack and Queue Types

FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- MARCH, 2012 DATA STRUCTURE (Common to CT and IF) [Time: 3 hours

R13. II B. Tech I Semester Supplementary Examinations, May/June DATA STRUCTURES (Com. to ECE, CSE, EIE, IT, ECC)

CSCE 2014 Final Exam Spring Version A

1 P age DS & OOPS / UNIT II

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

CS 112 Final May 8, 2008 (Lightly edited for 2011 Practice) Name: BU ID: Instructions GOOD LUCK!

First Semester - Question Bank Department of Computer Science Advanced Data Structures and Algorithms...

CS8391-DATA STRUCTURES

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

CSE 373 OCTOBER 11 TH TRAVERSALS AND AVL

CSE 332 Winter 2015: Midterm Exam (closed book, closed notes, no calculators)

CS200 Spring 2004 Midterm Exam II April 14, Value Points a. 5 3b. 16 3c. 8 3d Total 100

CS 112 Final May 8, 2008 (Lightly edited for 2012 Practice) Name: BU ID: Instructions

CPSC 331 Term Test #2 March 26, 2007

Module 1: Asymptotic Time Complexity and Intro to Abstract Data Types

#06 More Structures LIFO FIFO. Contents. Queue vs. Stack 3. Stack Operations. Pop. Push. Stack Queue Hash table

Postfix (and prefix) notation

Hashing. October 19, CMPE 250 Hashing October 19, / 25

Lecture Notes. char myarray [ ] = {0, 0, 0, 0, 0 } ; The memory diagram associated with the array can be drawn like this

ADVANCED DATA STRUCTURES USING C++ ( MT-CSE-110 )

CS24 Week 4 Lecture 2

Sorting Pearson Education, Inc. All rights reserved.

Data Structures. Chapter 06. 3/10/2016 Md. Golam Moazzam, Dept. of CSE, JU

Department of Computer Science and Engineering. CSE 2011: Fundamentals of Data Structures Winter 2009, Section Z

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

University of Waterloo Department of Electrical and Computer Engineering ECE250 Algorithms and Data Structures Fall 2014

Week 6. Data structures

Stacks and Queues. Stack - Abstract Data Type. Stack Applications. Stack - Abstract Data Type - C Interface

Basic Data Structures (Version 7) Name:

CS W3134: Data Structures in Java

DEEPIKA KAMBOJ UNIT 2. What is Stack?

Faculty of Science FINAL EXAMINATION COMP-250 A Introduction to Computer Science School of Computer Science, McGill University

Table of Contents. Chapter 1: Introduction to Data Structures... 1

COSC 2007 Data Structures II Final Exam. Part 1: multiple choice (1 mark each, total 30 marks, circle the correct answer)

Data Structures. Outline. Introduction Linked Lists Stacks Queues Trees Deitel & Associates, Inc. All rights reserved.

Assume you are given a Simple Linked List (i.e. not a doubly linked list) containing an even number of elements. For example L = [A B C D E F].

CS-141 Exam 2 Review April 4, 2015 Presented by the RIT Computer Science Community

CPSC 221: Algorithms and Data Structures ADTs, Stacks, and Queues

Introduction to Data Structure

Tail Calls. CMSC 330: Organization of Programming Languages. Tail Recursion. Tail Recursion (cont d) Names and Binding. Tail Recursion (cont d)

CSCI E 119 Section Notes Section 11 Solutions

Visit ::: Original Website For Placement Papers. ::: Data Structure

CSE373 Fall 2013, Second Midterm Examination November 15, 2013

CSE 373: Data Structures and Algorithms

Department of Computer Science and Technology

Arrays and Linked Lists

Types of Data Structures

APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY THIRD SEMESTER B.TECH DEGREE EXAMINATION, JULY 2017 CS205: DATA STRUCTURES (CS, IT)

Data Abstractions. National Chiao Tung University Chun-Jen Tsai 05/23/2012

GUJARAT TECHNOLOGICAL UNIVERSITY COMPUTER ENGINEERING (07) / INFORMATION TECHNOLOGY (16) / INFORMATION & COMMUNICATION TECHNOLOGY (32) DATA STRUCTURES

Csci 102: Sample Exam

1. Attempt any three of the following: 15

University of Illinois at Urbana-Champaign Department of Computer Science. Second Examination

COSC160: Data Structures: Lists and Queues. Jeremy Bolton, PhD Assistant Teaching Professor

Transcription:

UNIT 6A Organizing Data: Lists 1 Last Two Weeks Algorithms: Searching and sorting Problem solving technique: Recursion Asymptotic worst case analysis using the big O notation 2 1

This Week Observe: The choice of how to organize your data determines how efficiently you can do certain types of computation Data structures Arrays, linked lists, stacks, queues, hash tables, trees, graphs Representation of dynamic sets and supporting operations such as search, insert, delete 3 Data Structure The organization of data is a very important issue for computation. A data structure is a way of storing data in a computer so that it can be used efficiently. Choosing the right data structure will allow us to develop certain algorithms for that data that are more efficient. An array (or list) is a very simple data structure for holding a sequence of data. 4 2

Arrays in Memory Typically, array elements are stored in adjacent memory cells. The subscript (or index) is used to calculate an offset to find the desired element. Example: data = [50, 42, 85, 71, 99] Assume integers are stored using 4 bytes (32 bits). If we want data[3], the computer takes the address of the start of the array and adds the offset * the size of an array element to find the element we want. Address 100 104 108 112 116 Contents 50 42 85 71 99 Do you see why the first index of an array is 0 now? Location of data[3] is 100 + 3*4 = 112 5 Pros: Arrays: Pros and Cons Access to an array element is fast since we can compute its location quickly. Cons: If we want to insert or delete an element, we have to shift subsequent elements which slows our computation down. We need a large enough block of memory to hold our array. 6 3

Linked Lists Another data structure that stores a sequence of data values is the linked list. Data values in a linked list do not have to be stored in adjacent memory cells. To accommodate this feature, each data value has an additional pointer that indicates where the next data value is in computer memory. In order to use the linked list, we only need to know where the first data value is stored. 7 Linked List Example Linked list to store the sequence: 50, 42, 85, 71, 99 address data next Assume each integer and pointer requires 4 bytes. Starting Location of List (head) 124 100 42 148 108 99 0 (null) 116 124 50 100 132 71 108 140 148 85 132 156 8 4

Linked List Example To insert a new element, we only need to change a few pointers. address data next Example: 100 42 156 Insert 20 108 99 0 (null) after 42. 116 Starting Location of List (head) 124 124 50 100 132 71 108 140 148 85 132 156 20 148 9 Drawing Linked Lists Abstractly L = [50, 42, 85, 71, 99] head 50 42 85 71 99 null Inserting 20 after 42: head step 2 20 step 1 We link the new node to the list before breaking the existing link. 50 42 85 71 99 null 10 5

Pros: Linked Lists: Pros and Cons Inserting and deleting data does not require us to move/shift subsequent data elements. Cons: If we want to access a specific element, we need to traverse the list from the head of the list to find it which can take longer than an array access. Linked lists require more memory. (Why?) 11 Two dimensional arrays Some data can be organized efficiently in a table (also called a matrix or 2 dimensional array) Each cell is denoted with two subscripts, B 0 1 2 3 4 a row and column 0 3 18 43 49 65 indicator 1 14 30 32 53 75 2 9 28 38 50 73 B[2][3] = 50 3 10 24 37 58 62 4 7 19 40 46 66 12 6

2D Arrays in Ruby data = [ [ 1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12] ] 0 1 2 3 0 1 2 3 4 1 5 6 7 8 2 9 10 11 12 data[0] => [1, 2, 3, 4] data[1][2] => 7 data[2][5] => nil data[4][2] => undefined method '[]' for nil 13 2D Array Example in Ruby Find the sum of all elements in a 2D array def summatrix(table) number of rows in the table sum = 0 for row in 0..table.length-1 do for col in 0..table[row].length-1 do sum = sum + table[row][col] return sum number of columns in the given row of the table 14 7

Tracing the Nested Loop for row in 0..table.length-1 do for col in 0..table[row].length-1 do sum = sum + table[row][col] 0 1 2 3 0 1 2 3 4 1 5 6 7 8 2 9 10 11 12 table.length = 3 table[row].length = 4 for every row row col sum 0 0 1 0 1 3 0 2 6 0 3 10 1 0 15 1 1 21 1 2 28 1 3 36 2 0 45 2 1 55 2 2 66 2 3 78 15 Stacks A stack is a data structure that works on the principle of Last In First Out (LIFO). LIFO: The last item put on the stack is the first item that can be taken off. Common stack operations: Push put a new element on to the top of the stack Pop remove the top element from the top of the stack Applications: calculators, compilers, programming 16 8

RPN Some modern calculators use Reverse Polish Notation (RPN) Developed in 1920 by Jan Lukasiewicz Computation of mathematical formulas can be done without using any parentheses Example: ( 3 + 4 ) * 5 = becomes in RPN: 3 4 + 5 * 17 RPN Example Convert the following standard mathematical expression into RPN: (23 3) / (4 + 6) 23 3 4 6 + operand1 operand2 operator operand1 operand2 operator 23 3 4 6 + / operand1 operand2 operator 18 9

Evaluating RPN with a Stack i 0 A 23 3-4 6 + / yes x A[i] i == A.length? no Is x a number? yes Push x on S i i + 1 no Pop top 2 numbers Perform operation Push result on S 23 420 + / 63 10 = 10 20 = 2 S 6 10 3 4 20 23 2 Output Pop S Answer: 2 19 Example Step by Step RPN: 23 3-4 6 + / Stack Trace: 6 3 4 4 10 23 23 20 20 20 20 2 20 10

Stacks in Ruby You can treat arrays (lists) as stacks in Ruby. stack stack = [] [] stack.push(1) [1] stack.push(2) [1,2] stack.push(3) [1,2,3] x = stack.pop() [1,2] 3 x = stack.pop() [1] 2 x = stack.pop() [] 1 x = stack.pop() nil nil x 21 Queues A queue is a data structure that works on the principle of First In First Out (FIFO). FIFO: The first item stored in the queue is the first item that can be taken out. Common queue operations: Enqueue put a new element in to the rear of the queue Dequeue remove the first element from the front of the queue Applications: printers, simulations, networks 22 11

UNIT 6B Organizing Data: Hash Tables 23 Comparing Algorithms You are a professor and you want to find an exam in a large pile of n exams. Search the pile using linear search. Per student: O(n) Total for n students: O(n 2 ) Have an assistant sort the exams first by last name. Assistant s work: O(n log n) using merge sort Professor: Search for one student: O(log n) using binary search Total for n students: O(n log n) 24 12

Another way Set up a large number of buckets. Place each exam into a bucket based on some function. Example: 100 buckets, each labeled with a value from 00 to 99. Use the student s last two digits of their student ID number to choose the bucket. Ideally, if the exams get distributed evenly, there will be only a few exams per bucket. Assistant: O(n) putting n exams into the buckets Professor: O(1) search for an exam by going directly to the relevant bucket and searching through a few exams. 25 Hashing Universe of keys Hash Table key1 key3 key2 h(key1) h(key3) h(key2) A hash function h is used to map keys to hash-table slots 26 13

Strings and ASCII codes s = "hello" for i in 0..s.length-1 do print s[i], "\n" 104 101 108 108 111 You can treat a string like an array in Ruby. If you access the ith character, you get the ASCII code for that character. 27 Hash table Let s assume that we are going to store only lower case strings into an array (hash table). table1 = Array.new(26) => [nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil] 28 14

Hash table We could pick the array position where each string is stored based on the first letter of the string using this hash function: def h(string) return string[0] - 97 The ASCII values of lowercase letters are: a > 97, b > 98, c > 99, d > 100, etc. 29 Inserting into Hash Table To insert into the hash table, we simply use the hash function h to determine which index ( bucket ) to store the element. def insert(table, name) table[h(name)] = name insert(table1, aardvark ) insert(table1, beaver )... 30 15

Hash function (cont d) Using the hash function h: aardvark would be stored in an array at index 0 beaver would be stored in an array at index 1 kangaroo would be stored in an array at index 10 whale would be stored in an array at index 22 table1 => ["aardvark", "beaver", nil, nil, nil, nil, nil, nil, nil, nil, "kangaroo", nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, "whale", nil, nil, nil] 31 Hash function (cont d) But if we try to insert bunny and bear into the hash table, each word overwrites the previous word since they all hash to index 1: >> insert(table1,"bunny") >> insert(table1,"bear") >> table1 => ["aardvark", "bear", nil, nil, nil, nil, nil, nil, nil, nil, "kangaroo", nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, "whale", nil, nil, nil] 32 16

Revised Hash table Let s make our hash table an array of arrays (an array of buckets ) Each bucket can hold more than one string. table2 = Array.new(26) for i in 0..25 do table2[i] = [] => [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []] 33 Revised insert function def insert(table, key) # find the bucket (array) in the table # array using the hash function h bucket = table[h(key)] # app the key string to the bucket # array bucket << key 34 17

Inserting into new hash table insert(table2, "aardvark") >> insert(table2, "beaver") >> insert(table2, "kangaroo") >> insert(table2, "whale") >> insert(table2, "bunny") >> insert(table2, "bear") >> table2 => [["aardvark"], ["beaver", "bunny", "bear"], [], [], [], [], [], [], [], [], ["kangaroo"], [], [], [], [], [], [], [], [], [], [], [], ["whale"], [], [], []] 35 Collisions beaver, bunny and bear all up in the same bucket. These are collisions in a hash table. The more collisions you have in a bucket, the more you have to search in the bucket to find the desired element. We want to try to minimize the collisions by creating a hash function that distribute the keys (strings) into different buckets as evenly as possible. 36 18

First Try def h(string) k = 0 for i in 0..string.length-1 do k = string[i] + k return k h( hello ) => 532 h( olleh ) => 532 Permutations still give same index (collision) and numbers are high. 37 Second Try def h(string) k = 0 for i in 0..string.length-1 do k = string[i] + k*256 return k h( hello ) => 448378203247 h( olleh ) => 478560413032 Better, but numbers are still high. We probably don t want to (or can t) create arrays that have indices this large. 38 19

Third Try def h(string, tablesize) k = 0 for i in 0..string.length-1 do k = string[i] + k*256 return k % tablesize We can use the modulo operator to take the large values and map them to indices for a smaller array. 39 Revised insert function def insert(table, key) # find the bucket (array) in the table # array using the hash function h bucket = table[h(key, table.length)] # app the key string to the bucket # array bucket << key 40 20

Final results table3 = Array.new(13) for i in 0..12 do table3[i] = [] => [[], [], [], [], [], [], [], [], [], [], [], [], []] >> insert(table3,"aardvark") >> insert(table3,"bear") >> insert(table3,"bunny") >> insert(table3,"beaver") >> insert(table3,"dog") >> table3 Still have one collision, but b-words are distributed better. => [[], [], [], [], [], [], [], [], [], ["bunny"], ["aardvark", "bear"], ["dog"], ["beaver"]] 41 Searching in a hash table To search for a key, use the hash function to find out which bucket it should be in, if it is in the table at all. def contains?(table, key) bucket = table[h(key,table.length)] for entry in bucket do return true if entry == key return false 42 21

Efficiency If the keys (strings) are distributed well throughout the table, then each bucket will only have a few keys and the search should take O(1) time. Example: If we have a table of size 1000 and we hash 4000 keys into the table and each bucket has approximately the same number of keys (approx. 4), then a search will only require us to look at approx. 4 keys => O(1) But, the distribution of keys is depent on the keys and the hash function we use! 43 22