DATA STRUCTURE AND ALGORITHM USING PYTHON

Similar documents
We can use a max-heap to sort data.

UNIT 5C Merge Sort Principles of Computing, Carnegie Mellon University

UNIT 5C Merge Sort. Course Announcements

CP222 Computer Science II. Searching and Sorting

Component 02. Algorithms and programming. Sorting Algorithms and Searching Algorithms. Matthew Robinson

CSE 390a Lecture 7. Regular expressions, egrep, and sed

CSE 303 Lecture 7. Regular expressions, egrep, and sed. read Linux Pocket Guide pp , 73-74, 81

Faster Sorting Methods

CS1 Lecture 30 Apr. 2, 2018

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

Sorting. CSE 143 Java. Insert for a Sorted List. Insertion Sort. Insertion Sort As A Card Game Operation. CSE143 Au

Sorting. CSC 143 Java. Insert for a Sorted List. Insertion Sort. Insertion Sort As A Card Game Operation CSC Picture

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms

Computer Science 4U Unit 1. Programming Concepts and Skills Algorithms

Sorting Algorithms. + Analysis of the Sorting Algorithms

Quick Sort. CSE Data Structures May 15, 2002

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

CSc 110, Spring 2017 Lecture 39: searching

Sorting Algorithms. Slides used during lecture of 8/11/2013 (D. Roose) Adapted from slides by

Sorting Pearson Education, Inc. All rights reserved.

Bubble sort is so named because the numbers are said to bubble into their correct positions! Bubble Sort

ECE 2574: Data Structures and Algorithms - Basic Sorting Algorithms. C. L. Wyatt

Algorithms and Applications

UNIT 7. SEARCH, SORT AND MERGE

PRAM Divide and Conquer Algorithms

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

CS 171: Introduction to Computer Science II. Quicksort

CompSci 105. Sorting Algorithms Part 2

Overview of Sorting Algorithms

having any value between and. For array element, the plot will have a dot at the intersection of and, subject to scaling constraints.

100 points total. CSE 3353 Homework 2 Spring 2013

Searching and Sorting

1 a = [ 5, 1, 6, 2, 4, 3 ] 4 f o r j i n r a n g e ( i + 1, l e n ( a ) 1) : 3 min = i

About this exam review

Algorithms Interview Questions

COMP Data Structures

UNIT-2. Problem of size n. Sub-problem 1 size n/2. Sub-problem 2 size n/2. Solution to the original problem

Sorting is a problem for which we can prove a non-trivial lower bound.

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

Sorting and Searching

Active Learning: Sorting

Lecture 6 Sorting and Searching

Lecture Notes 14 More sorting CSS Data Structures and Object-Oriented Programming Professor Clark F. Olson

CSC 273 Data Structures

CS61BL. Lecture 5: Graphs Sorting

CSC630/CSC730 Parallel & Distributed Computing

Quicksort. Alternative Strategies for Dividing Lists. Fundamentals of Computer Science I (CS F)

Parallel Sorting Algorithms

Unit-2 Divide and conquer 2016

Classic Data Structures Introduction UNIT I

Java How to Program, 9/e. Copyright by Pearson Education, Inc. All Rights Reserved.

Lecture 15: Algorithms. AP Computer Science Principles

Searching, Sorting. Arizona State University 1

Fast Bit Sort. A New In Place Sorting Technique. Nando Favaro February 2009

Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix

Name: Lirong TAN 1. (15 pts) (a) Define what is a shortest s-t path in a weighted, connected graph G.

CS240 Fall Mike Lam, Professor. Quick Sort

Divide-and-Conquer. Dr. Yingwu Zhu

1 5,9,2,7,6,10,4,3,8,1 The first number (5) is automatically the first number of the sorted list

CS301 - Data Structures Glossary By

S O R T I N G Sorting a list of elements implemented as an array. In all algorithms of this handout the sorting of elements is in ascending order

Sorting. CPSC 259: Data Structures and Algorithms for Electrical Engineers. Hassan Khosravi

CSE 154 LECTURE 11: REGULAR EXPRESSIONS

CSCI 121: Searching & Sorting Data

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

Copyright 2012 by Pearson Education, Inc. All Rights Reserved.

Topic 17 Fast Sorting

9. The Disorganized Handyman

Divide and Conquer. Algorithm Fall Semester

Divide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions

Key question: how do we pick a good pivot (and what makes a good pivot in the first place)?

Comparison Sorts. Chapter 9.4, 12.1, 12.2

Divide and Conquer Algorithms

106B Final Review Session. Slides by Sierra Kaplan-Nelson and Kensen Shi Livestream managed by Jeffrey Barratt

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)

Sorting. Sorting. Stable Sorting. In-place Sort. Bubble Sort. Bubble Sort. Selection (Tournament) Heapsort (Smoothsort) Mergesort Quicksort Bogosort

Lecture 6: Divide-and-Conquer

Introduction to Computers and Programming. Today

Merge Sort Roberto Hibbler Dept. of Computer Science Florida Institute of Technology Melbourne, FL

Hiroki Yasuga, Elisabeth Kolp, Andreas Lang. 25th September 2014, Scientific Programming

Mergesort again. 1. Split the list into two equal parts

Part II: Memory Management. Chapter 7: Physical Memory Chapter 8: Virtual Memory Chapter 9: Sharing Data and Code in Main Memory

DATA STRUCTURES AND ALGORITHMS

SORTING. Practical applications in computing require things to be in order. To consider: Runtime. Memory Space. Stability. In-place algorithms???

Better sorting algorithms (Weiss chapter )

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

Sorting: Given a list A with n elements possessing a total order, return a list with the same elements in non-decreasing order.

CAD Algorithms. Categorizing Algorithms

Today. CISC101 Reminders & Notes. Searching in Python - Cont. Searching in Python. From last time

Pseudo code of algorithms are to be read by.

Algorithms and Data Structures (INF1) Lecture 7/15 Hua Lu

Information Coding / Computer Graphics, ISY, LiTH

Chapter 4. Divide-and-Conquer. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Visualizing Data Structures. Dan Petrisko

Sorting. Weiss chapter , 8.6

Linked Structures Songs, Games, Movies Part III. Fall 2013 Carola Wenk

Divide and Conquer Sorting Algorithms and Noncomparison-based

2/14/13. Outline. Part 5. Computational Complexity (2) Examples. (revisit) Properties of Growth-rate functions(1/3)

FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- MARCH, 2012 DATA STRUCTURE (Common to CT and IF) [Time: 3 hours

Merge Sort fi fi fi 4 9. Merge Sort Goodrich, Tamassia

Transcription:

DATA STRUCTURE AND ALGORITHM USING PYTHON Sorting, Searching Algorithm and Regular Expression Peter Lo

Sorting Algorithms Put Elements of List in Certain Order 2

Bubble Sort The bubble sort makes multiple passes through a list. It compares adjacent items and exchanges those that are out of order. Each pass through the list places the next largest value in its proper place. In essence, each item bubbles up to the location where it belongs. If there are n items in the list, then there are n 1 pairs of items that need to be compared on the first pass. It is important to note that once the largest value in the list is part of a pair, it will continually be moved along until the pass is complete. 3

Bubble Sort Workflow fds 4

Example 5

Selection Sort The selection sort improves on the bubble sort by making only one exchange for every pass through the list. In order to do this, a selection sort looks for the smallest/largest value as it makes a pass and, after completing the pass, places it in the proper location. As with a bubble sort, after the first pass, the largest item is in the correct place. After the second pass, the next largest is in place. This process continues and requires n 1 passes to sort n items, since the final item must be in place after the (n 1) pass. 6

Selection Sort Workflow 7

Example (Find the Smallest) 8

Example (Find the Largest) 9

Insertion Sort The insertion sort always maintains a sorted sublist in the lower positions of the list. Each new item is then inserted back into the previous sublist such that the sorted sublist is one item larger. We begin by assuming that a list with one item (position 0) is already sorted. On each pass, one for each item 1 through n 1, the current item is checked against those in the already sorted sublist. As we look back into the already sorted sublist, we shift those items that are greater to the right. When we reach a smaller item or the end of the sublist, the current item can be inserted 10

Insertion Sort Workflow 11

Example 12

Merge Sort Merge Sort is a recursive algorithm that continually splits a list in half. If the list is empty or has one item, it is sorted by definition. If the list has more than one item, we split the list and recursively invoke a merge sort on both halves. Once the two halves are sorted, the fundamental operation, called a merge, is performed. Merging is the process of taking two smaller sorted lists and combining them together into a single, sorted, new list. 13

Merge Sort Workflow 1. Divide the list into two parts 2. Divide the list into two parts again 3. Break each element into single part 4. Sort the element from smallest to largest 5. Merge the divided sorted arrays together 6. The array has been sorted 14

Example 15

Quick Sort The Quick Sort algorithm consists of three steps: Divide: Partition the list To partition the list, we first choose a Pivot from the list for which we hope about half the elements will come before and half after. Then we partition the elements so that all those with value less than the pivot come in one sub list and all those with greater values come in another Recursion: Recursively sort the sub lists separately Conquer: Put the sorted sub lists together 16

Quick Sort Workflow 17

Example 18

Comparison of Sorting Algorithm Worst Case Average Case Selection Sort n 2 n 2 Bubble Sort n 2 n 2 Insertion Sort n 2 n 2 Merge Sort n x log n n x log n Quick Sort n 2 n x log n 19

Python sort() Function Python lists have a built-in list.sort() method that modifies the list in-place. There is also a sorted() built-in function that builds a new sorted list from an iterable. 20

Which algorithm does Python sorted() use? Timsort has been Python's standard sorting algorithm since version 2.3. Timsort is a hybrid sorting algorithm, derived from Merge Sort and Insertion Sort, designed to perform well on many kinds of real-world data. It was invented by Tim Peters in 2002 for use in the Python programming language. The algorithm finds subsets of the data that are already ordered, and uses the subsets to sort the data more efficiently. This is done by merging an identified subset, called a run, with existing runs until certain criteria are fulfilled. 21

Searching Algorithms Sequential Search, Binary Search 22

Sequential Search Starting at the first item in the list, we simply move from item to item, following the underlying sequential ordering until we either find what we are looking for or run out of items. If we run out of items, we have discovered that the item we were searching for was not present 23

Example 24

Binary Search Instead of searching the list in sequence, a binary search will start by examining the middle item. If that item is the one we are searching for, we are done. If it is not the correct item, we can use the ordered nature of the list to eliminate half of the remaining items. If the item we are searching for is greater than the middle item, we know that the entire lower half of the list as well as the middle item can be eliminated from further consideration. The item, if it is in the list, must be in the upper half. We can then repeat the process with the upper half. Start at the middle item and compare it against what we are looking for. Again, we either find it or split the list in half, therefore eliminating another large part of our possible search space. 25

Binary Search Workflow 26

Example 27

Comparison 28

Regular Expression A Simplified Guide 29

Regular Expression Module A regular expression in a programming language is a special text string used for describing a search pattern. It is extremely useful for extracting information from text such as code, files, log, spreadsheets or even documents. It is widely used in natural language processing, web applications that require validating string and pretty much most data science projects that involve text mining. In python, it is implemented in the standard module re. More information can be found in https://docs.python.org/3/library/re.html 30

What is a regex pattern? A regex pattern is a special language used to represent generic text, numbers or symbols so it can be used to extract texts that conform to that pattern. Consider an example expression \s+. Here the \s matches any whitespace character. By adding a '+' notation at the end will make the pattern match at least 1 or more spaces. So this pattern will match even tab characters as well. 31

Split String Separated by regex If you intend to use a particular pattern multiple times, then you are better off compiling a regular expression rather than using re.split over and over again. This file contain three column, but the separator are different. The '\s' matches any whitespace character. By adding a '+' notation at the end will make the pattern match at least 1 or more spaces. This pattern will match even tab '\t' characters as well 32

Greedy vs Non-greedy Matching Greedy matching gets the longest results possible Nongreedy matching gets the shortest possible Consider an String = 123 ABC 456 xyz For greedy expression: \d+ Result: ['123', '456 ] Maximizes the length of \d For non-greedy expression: \d+? Result: ['1', '2', '3', '4', '5', '6'] Minimizes the length of \d 33

Wildcards and Anchors. (a dot) matches any character except \n ".oo.y" matches "Doocy", "goofy", "LooPy",... use \. to literally match a dot. character 34

Wildcards and Anchors ^ matches the beginning of a line; $ the end "^fi$" matches lines that consist entirely of fi \< demands that pattern is the beginning of a word; \> demands that pattern is the end of a word "\<for\>" matches lines that contain the word "for" 35

Boolean means OR "abc def g" matches lines with "abc", "def", or "g" precedence of ^(Subject Date) vs. ^Subject Date: There's no AND symbol. 36

Grouping () are for grouping "(Homer Marge)" matches lines containing "Homer" or "Marge" 37

Finding Matched Pattern There are three method for searching matched pattern: findall() returns the matched portions of the text as a list search() returns a particular match object that contains the starting and ending positions of the first occurrence of the pattern. match() also returns a match object, but the difference is, it requires the pattern to be present at the beginning of the text itself. 38

Finding Pattern using findall The findall() method extracts all occurrences of the 1 or more digits from the text and returns them in a list. the special character '\d' is a regular expression which matches any digit. Adding a '+' symbol to it mandates the presence of at least 1 digit to be present in order to be found. 39

Finding Pattern using search The search() method return the found value together with its position. 40

Finding Pattern using match If the pattern is not present at the beginning of the text itself, the match() method unable to find the result. 41

Substitute Text with Another To replace texts, use the sub() method. 42

Regular Expression Quick Guide Symbol Meaning ^ Matches the beginning of a line $ Matches the end of the line. Matches any character except line terminators like \n. * Repeats a character zero or more times *? Repeats a character zero or more times (non-greedy) + Repeats a character one or more times +? Repeats a character one or more times (non-greedy) ( Indicates where string extraction is to start ) Indicates where string extraction is to end 43

Regular Expression Quick Guide Symbol \s Matches whitespace Meaning \S Matches any non-whitespace character \d Matches a digit \D Matches a non-digit \w Matches alphanumeric characters which means a-z, A-Z, 0-9 and underscore, _. \W Matches a non-alphanumeric characters \n Matches a new line [abcde] [^xyz] [a-z0-9] Matches a single character in the listed set Matches a single character not in the listed set The set of characters can include a range 44