CSE 100 Advanced Data Structures

Similar documents
Data Structures Question Bank Multiple Choice

Final Examination CSE 100 UCSD (Practice)

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010

Quiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g)

CS301 - Data Structures Glossary By

Balanced Binary Search Trees. Victor Gao

CSE100. Advanced Data Structures. Lecture 8. (Based on Paul Kube course materials)

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

Trees (Part 1, Theoretical) CSE 2320 Algorithms and Data Structures University of Texas at Arlington

Binary Trees

CS102 Binary Search Trees

The questions will be short answer, similar to the problems you have done on the homework

Course Review. Cpt S 223 Fall 2009

DATA STRUCTURES AND ALGORITHMS. Hierarchical data structures: AVL tree, Bayer tree, Heap

CSE373: Data Structures & Algorithms Lecture 5: Dictionary ADTs; Binary Trees. Linda Shapiro Spring 2016

INF2220: algorithms and data structures Series 1

Course Review for Finals. Cpt S 223 Fall 2008

Trees. (Trees) Data Structures and Programming Spring / 28

Data Structures and Algorithm Analysis in C++

Cpt S 223 Fall Cpt S 223. School of EECS, WSU

Operations on Heap Tree The major operations required to be performed on a heap tree are Insertion, Deletion, and Merging.

Balanced Search Trees. CS 3110 Fall 2010

Algorithms and Data Structures

Cpt S 223 Course Overview. Cpt S 223, Fall 2007 Copyright: Washington State University

EE 368. Weeks 5 (Notes)

CSC148 Week 7. Larry Zhang

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures.

Friday Four Square! 4:15PM, Outside Gates

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang)

Course Review. Cpt S 223 Fall 2010

Chapter 10: Trees. A tree is a connected simple undirected graph with no simple circuits.

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

TREES Lecture 12 CS2110 Spring 2019

Trees. Eric McCreath

Sorted Arrays. Operation Access Search Selection Predecessor Successor Output (print) Insert Delete Extract-Min

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Algorithms. AVL Tree

Data Structures and Algorithms

CSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012

Computer Science 210 Data Structures Siena College Fall Topic Notes: Trees

Note that this is a rep invariant! The type system doesn t enforce this but you need it to be true. Should use repok to check in debug version.

CSCI2100B Data Structures Trees

Chapter 22 Splay Trees

Analysis of Algorithms

Data Structures and Algorithms

Recitation 9. Prelim Review

CS 171: Introduction to Computer Science II. Binary Search Trees

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

TREES AND ORDERS OF GROWTH 7

CSE373: Data Structures & Algorithms Lecture 28: Final review and class wrap-up. Nicki Dell Spring 2014

Binary Search Trees. Analysis of Algorithms

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

EE 368. Week 6 (Notes)

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

We will show that the height of a RB tree on n vertices is approximately 2*log n. In class I presented a simple structural proof of this claim:

Department of Computer Applications. MCA 312: Design and Analysis of Algorithms. [Part I : Medium Answer Type Questions] UNIT I

Analysis of Algorithms

CSE 332, Spring 2010, Midterm Examination 30 April 2010

Tree Structures. A hierarchical data structure whose point of entry is the root node

V Advanced Data Structures

Computer Science 210 Data Structures Siena College Fall Topic Notes: Binary Search Trees

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

INSTITUTE OF AERONAUTICAL ENGINEERING

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: April 1, 2015

Binary Trees, Binary Search Trees

Introduction to Algorithms March 11, 2009 Massachusetts Institute of Technology Spring 2009 Professors Sivan Toledo and Alan Edelman Quiz 1

CSE100 Practice Final Exam Section C Fall 2015: Dec 10 th, Problem Topic Points Possible Points Earned Grader

Advanced Set Representation Methods

Choice of C++ as Language

CSE100. Advanced Data Structures. Lecture 13. (Based on Paul Kube course materials)

DDS Dynamic Search Trees

Course Review for. Cpt S 223 Fall Cpt S 223. School of EECS, WSU

CS Transform-and-Conquer

CS24 Week 8 Lecture 1

Recall: Properties of B-Trees

CS251-SE1. Midterm 2. Tuesday 11/1 8:00pm 9:00pm. There are 16 multiple-choice questions and 6 essay questions.

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

Balanced Binary Search Trees

V Advanced Data Structures

B+ Tree Review. CSE332: Data Abstractions Lecture 10: More B Trees; Hashing. Can do a little better with insert. Adoption for insert

Dynamic Access Binary Search Trees

Instructions. Definitions. Name: CMSC 341 Fall Question Points I. /12 II. /30 III. /10 IV. /12 V. /12 VI. /12 VII.

Second Examination Solution

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

CSE 373 Winter 2009: Midterm #1 (closed book, closed notes, NO calculators allowed)

Dynamic Access Binary Search Trees

Self-Balancing Search Trees. Chapter 11

Lecture Summary CSC 263H. August 5, 2016

COSC 2007 Data Structures II Final Exam. Part 1: multiple choice (1 mark each, total 30 marks, circle the correct answer)

Some Search Structures. Balanced Search Trees. Binary Search Trees. A Binary Search Tree. Review Binary Search Trees

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Pseudo code of algorithms are to be read by.

Disk Accesses. CS 361, Lecture 25. B-Tree Properties. Outline

Priority Queues Heaps Heapsort

What is an algorithm?

Lec 17 April 8. Topics: binary Trees expression trees. (Chapter 5 of text)

CSCE 210/2201 Data Structures and Algorithms. Prof. Amr Goneid. Fall 2018

Introduction to Data Structure

CS302 Topic: Algorithm Analysis. Thursday, Sept. 22, 2005

CSE 326: Data Structures Splay Trees. James Fogarty Autumn 2007 Lecture 10

Transcription:

CSE 100 Advanced Data Structures Overview of course requirements Outline of CSE 100 topics Review of trees Helpful hints for team programming Information about computer accounts Page 1 of 25

CSE 100 web pages All information related to the course is available in the textbook or online, following links from the class home page: http://ieng6.ucsd.edu/~cs100f You re responsible for knowing that information, so make a note of that URL and read what s there Page 2 of 25

Topics for the course! In CSE 100, we will build on what you have already learned about programming: procedural and data abstraction, object-oriented programming, and elementary data structure and algorithm design, implementation, and analysis We will build on that, and go beyond it, to learn about more advanced, highperformance data structures and algorithms: Balanced search trees: AVL, red-black, B-trees Binary tries and Huffman codes for compression Graphs as data structures, and graph algorithms Data structures for disjoint-subset and union-find algorithms More about hash functions and hashing techniques Randomized data structures: skip lists, treaps Amortized cost analysis The C++ standard template library (STL) Page 3 of 25

Data structures, in general A data structure is... a structure that holds data A data structure is an object that offers certain useful operations (its Application Programmer Interface, or API), for example storing, retrieving, and deleting data of a certain type A data structure may offer certain performance guarantees on its operations, for example certain best-, worst-, or average-case time or space costs To meet those performance guarantees, a data structure may need to be implemented in a particular way In CSE 100 we will study the performance guarantees that are permitted by various data structure implementations We will begin by reviewing trees... Page 4 of 25

A review of trees A tree is a hierarchical (not just linear, and not unstructured!) data structure A tree is a set of elements called nodes, structured by a "parent" relation: If the tree is nonempty, exactly one node in the set is the root of the tree The root of a tree is the unique node that has no parent Every node in the set except the root has exactly one other node that is its parent Page 5 of 25

Drawing trees A B C D E F G H I J K The root goes at the top: here node A is the root of the tree (in Computerscienceland, trees grow upside down) The parent of a node is drawn above that node, with a "link" or "edge" from the node to its parent: here node A is the parent of nodes B,C ; and B,C are called the children of A Some nodes have no children, and are called leaves of the tree: here nodes D, G, I, J, L are leaves L Page 6 of 25

Tree terminology: some definitions Children of a node P: the set of nodes that have P as parent Descendant of a node P: If a node C is a child of P, then C is a descendant of P If a node C is a child of a descendant of P, then C is a descendant of P Ancestor of a node C: if C is a descendant of P, then P is an ancestor of C Root of a tree: the unique node in the tree with no parent Leaves of a tree: the set of nodes with no children Subtree: The empty tree is a subtree of every tree Any node of a tree together with its descendants is a subtree of the tree Level or depth of a node (using zero-based counting): The level of the root is 0. The level of any non-root node is 1 + the level of its parent (this is equal to the number of edges on the path from the root to the node) Height of a node: the height of a node is the number of edges on the longest path from the node to a leaf Height of a tree: the height of the root of the tree Page 7 of 25

Binary trees A binary tree is a tree in which every node has at most two children A particular child of a binary tree node is either a left child or a right child Recursive definition of "binary tree": either the empty tree, or a node together with left and right subtrees which are both binary trees Examples: Page 8 of 25

Important binary tree properties Consider a completely filled binary tree (every level that has any nodes at all has as many nodes as possible): How many nodes at level 0? How many nodes at level 1? How many nodes at level 2? How many nodes at level 3? Generalizing, how many nodes at level L? And so, how many nodes in a completely filled binary tree of height H? And so, what is the height of a completely filled binary tree with N nodes? Page 9 of 25

In a completely filled binary tree with N nodes... Generalizing, how many nodes at level L? 2 L And so, how many nodes in a completely filled binary tree of height H? N H 2 L 2 H 1 1 L = 0 = = And so, what is the height of a completely filled binary tree with N nodes? 2 H 1 + = N + 1 H + 1 = log 2 ( N + 1) so H = O( logn) and H = ( logn) and so H = ( logn) Page 10 of 25

Reviewing big-o notation Write: gn = O( f N ) And say: " gn is big-o of fn " if there are positive constants c, n 0 such that for all N n 0, gn cf N... that is, g eventually grows no faster than f (times a constant).... f gives an asymptotic upper bound on the rate of growth of g.... the order of g is at most the order of f Page 11 of 25

Reviewing big-omega notation Write: gn = ( fn ) And say: " gn is big-omega of fn " if there are positive constants c, n 0 such that for all N n 0, gn cf N... that is, g eventually grows at least as fast as f (times a constant).... f gives an asymptotic lower bound on the rate of growth of g.... the order of g is at least the order of f Page 12 of 25

Reviewing big-theta notation Write: gn = ( fn ) And say: " gn is big-theta of fn " if gn is both big-o and big-omega of fn.... f gives a good qualitative estimate -- a tight bound -- on the rate of growth of g... the order of g is the same as the order of f Page 13 of 25

Genaralizing binary trees: K-ary trees An K-ary tree is a tree in which every node has at most K children K=2 gives binary trees, K=3 gives ternary trees, etc. Possible children of a node in a K-ary tree are sequentially ordered, left-to-right Recursive definition of "K-ary tree": either the empty tree, or a node together with at most K subtrees which are all K-ary trees Examples of useful K-ary (K>2) trees we will cover later: 2-3 trees, B-trees, alphabet tries Basic properties of binary trees generalize to properties of K-ary trees... Page 14 of 25

In a completely filled K-ary tree with M nodes... Generalizing, how many nodes at level L? K L And so, how many nodes in a completely filled K-ary tree of height H? N H = = L = 0 K L K H + 1 1 ----------------------- K 1 And so, what is the height of a completely filled K-ary tree with N nodes? K H 1 + = NK 1 + 1 H + 1 = log K ( NK 1 + 1) so H = O( logn) and H = ( logn) and so H = ( logn) Page 15 of 25

Tree properties, continued A completely filled K-ary tree with N nodes has the minimum height possible of any K-ary tree with N nodes... In fact for any K-ary tree, H log K NK 1 + 1 1 and so H log K N 2 But what is the maximum height possible for a K-ary tree with N nodes? Page 16 of 25

Tree properties, good and bad Many interesting operations on tree data structures have a time cost that, in the worst case, is proportional to the height of the tree Find, insert, and delete operations in search trees, Insert and delete-root operations in heaps, etc. For a completely filled tree, that means that these operations have a worst-case time cost that is a logarithmic function of the number of nodes N in the tree log(n) grows very slowly as a function of N, which is good! This fact is one of the main reasons that trees are an important data structure But for a worst-case tree, that means that these operations have a worst-case time cost that is a linear function of the number of nodes N in the tree This means the time cost grows proportionally to N, which is not very good! This fact is a major problem with using trees in many applications, but it can be overcome, as we will see Page 17 of 25

The importance of being balanced A binary tree of N nodes is considered balanced if its height is close to log 2 (N) The usual simple algorithm for inserting nodes in a search tree can produce unbalanced trees, which lead to poor performance... and this is can easily happen in practice: for example, it will happen if the keys to be inserted are sorted or almost sorted With cleverer insert operations, you can make sure the tree is always balanced no matter what, and guarantee excellent worst-case performance... at the cost of a more complicated implementation We will first look at implementation issues for binary search trees in general Then later we will look at a few approaches to improving performance of binary search trees: AVL trees, red-black trees, B-trees, splay trees, randomized search trees Page 18 of 25

Working in teams In CSE 100 this quarter, you are allowed and encouraged to write your programming assignments in teams of 2 This can work out very well, if members of the team have compatible skills, schedules, and personalities, and if you follow some basic software engineering principles (Otherwise, it won t work well, and you will be better off working on your own) I will sketch two approaches that you can use: The standard software life cycle Extreme programming (More information is available on the web and elsewhere) Page 19 of 25

The standard software life cycle Some variant of this is used in most software projects. Basic steps: Requirements Get the requirements for the software from the customer (or in CSE 100 from the assignment README...) Make sure they are clear and that you understand them! Design Before coding, create a design for a software system that will meet the requirements Use good design principles: top-down decomposition, abstraction, modularity, information hiding, etc. Code Divide the coding task among members of the team, and code according to a common standard (including style and comments) Test Thoroughly test each unit (block, method, class, package) and the entire system Deploy Deliver the completed system (or in CSE 100, turn in your assignment) Page 20 of 25

Extreme programming ( XP ) A new and somewhat different software engineering approach, very successful when used for small software projects (say, less than 16 programmers). Some XP practices: The planning process Rank desired system features by importance, determined by need and cost Pair programming Write all code in pairs, two programmers working together at one machine Driver controls the keyboard and mouse and types code; Observer makes suggestions, identifies problems, thinks strategically. Both brainstorm as needed. Switch roles often Small releases Put a simple system into production early, and update it frequently on a short cycle Test first and often XP teams focus on validation of the software at all times. Write unit tests first, then write code that fulfills the requirements reflected in the tests Refactoring XP teams improve the design of the system throughout the entire development. This is done by keeping the software clean: without duplication, simple yet complete Page 21 of 25

Basic software principles Whether you use XP or a more traditional approach, some things to keep in mind: Top-down, divide-and-conquer design strategies are useful; break the overall problem down into small, manageable subproblems Using object-oriented design, think of solving the problem using objects that have certain properties (instance variables) and behaviors (instance methods) In your design, be clear about what is the interface to each piece of your solution In object-oriented design, the interface to a piece of the solution consists of the public (static or instance) methods of a class; be clear about PRE and POST conditions, and what arguments, return values, and side effects each method has With a good design, each member of the team can, in principle, work on the implementation of a piece of the solution separately; as long as the implementation meets the interface specification, the system will work A strategy: at first, use implementations that are fast and easy to write, but that work. Later, improve the implementation for better performance, while keeping the interface the same Be sure to test and debug your solution! This may lead to a redesign, but hopefully it involves only small changes to the implementation Page 22 of 25

Getting started Read chapter 1 of the required textbook for a review of some mathematical concepts, and an introduction to C++ Read chapter 2 for a review of asymptotic analysis and big-o notation Read chapter 4 for a review of trees We will then start with a C++ implementation of a binary search tree class template Page 23 of 25

Class accounts If you are using your UCSD email account user name for CSE 100: password is your UCSD email account password be sure to run the command prep cs100f each time you log in to ieng6, to access course-specific paths, etc. If you are using a cs100 account for CSE 100: password is your UCSD email account password also be sure to prep cs100f after logging in If you don t know which accounts you have, use the ACS account lookup tool! For the Moodle discussion board: use your UCSD email account user name, with your email password Page 24 of 25

Next time... An introduction to C++ Comparing Java and C++ Basic C++ programming C++ primitive types and operators Arrays, pointers and pointer arithmetic The interface/implementation distinction in C++ C++ class templates Reading: Weiss Ch 1 Page 25 of 25