Web Clients and Crawlers
|
|
- Amy Carter
- 6 years ago
- Views:
Transcription
1 Web Clients and Crawlers 1 Web Clients alternatives to web browsers opening a web page and copying its content 2 Scanning Files looking for strings between double quotes parsing URLs for the server location 3 Web Crawlers making requests recursively incremental development, modular design of code 4 Parsing HTML the HTMLParser module overriding methods in HTMLParser MCS 507 Lecture 21 Mathematical, Statistical and Scientific Software Jan Verschelde, 14 October 2013 Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
2 Web Clients and Crawlers 1 Web Clients alternatives to web browsers opening a web page and copying its content 2 Scanning Files looking for strings between double quotes parsing URLs for the server location 3 Web Crawlers making requests recursively incremental development, modular design of code 4 Parsing HTML the HTMLParser module overriding methods in HTMLParser Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
3 Web Clients Usually we download files with a web browser, but we can browse the web using scripts. Why do we want to do this? 1 more efficient: no overhead from a GUI 2 in control: request only what we need update most recent information 3 crawl the web: search recursively operate like a search engine 4 the SymPy HTML documentation is interlinked search the documentaton files off line Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
4 Web Clients and Crawlers 1 Web Clients alternatives to web browsers opening a web page and copying its content 2 Scanning Files looking for strings between double quotes parsing URLs for the server location 3 Web Crawlers making requests recursively incremental development, modular design of code 4 Parsing HTML the HTMLParser module overriding methods in HTMLParser Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
5 web pages as files Opening a web page with urllib.urlopen: from urllib import urlopen < object like file > = urlopen( < URL > ) data = < object like file >.read( < size > ) < object like file >.close() we process web pages like we handle files Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
6 opening a web page The SymPy HTML documentation comes with the distribution of the sympy package. from urllib import urlopen URL = file:///users/jan/courses/mcs507/ \ + sympy docs-html/index.html PAGE = urlopen(url) DATA = PAGE.read(80) print 80 characters :, DATA LINES = DATA.split( \n ) print the lines :, LINES PAGE.close() Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
7 function to copy a web page to a file def copypage(url, filename): Given the URL for the web page, a copy of its contents is written to filename. Both url and file are strings. import urllib copyfile = open(filename, w ) webpage = urllib.urlopen(url) while True: data = webpage.read(80) print data if data == : break copyfile.write(data) webpage.close() copyfile.close() Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
8 urlretrieve With urllib.urlretrieve we copy a web page to file. >>> u = file:///users/jan/courses/mcs507/ \... + sympy docs-html/index.html >>> from urllib import urlretrieve >>> urlretrieve(u, mycopy.html ) ( mycopy.html, <mimetools.message instance at 0x1011cb878>) >>> f = open( mycopy.html, r ) >>> d = f.readlines() >>> print d[:4] [ \r\n, \r\n, <!DOCTYPE html PUBLIC "-// W3C//DTD XHTML 1.0 Transitional//EN"\r\n, " ansitional.dtd">\r\n ] Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
9 Web Clients and Crawlers 1 Web Clients alternatives to web browsers opening a web page and copying its content 2 Scanning Files looking for strings between double quotes parsing URLs for the server location 3 Web Crawlers making requests recursively incremental development, modular design of code 4 Parsing HTML the HTMLParser module overriding methods in HTMLParser Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
10 scanning HTML files Applications to scan an HTML file: 1 search for particular information, 2 navigate to where the page refers to. Example (1): download all.py files from jan/mcs507/main.html Example (2): retrieve all URLs the page refers to. What is common between these two examples:.py files and URLs appear between " and " scan for all strings between double quotes Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
11 scanning for double quoted strings Input: Output: a file, or object like a file. list of all strings between double quotes. Recall that we read files with fixed size buffer:..."....".."."...."..".".."... For double quoted strings which run across two buffers we need another buffer. Two buffers: Two functions: one for reading strings from file, one for buffering double quoted string. 1 read buffered data from file, 2 scan the data buffer for double quoted strings. Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
12 reading strings from file def quoted_strings(page): Given a file page, this function scans the page and returns a list of all strings on the file enclosed between double quotes. result = [] buff = while True: data = page.read(80) if(data == ): break (result, buff) = update_qstrings(result, \ buff, data) return result Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
13 processing the buffers In L we store the double quoted strings. In b we buffer the double quoted strings...."... L = [], b = "....".."." L = [...,. ], b =... L = [...,. ], b =.".."." L = [...,.,.. ], b = ".."... L = [...,.,..,.. ], b = def update_qstrings(quoted, buff, sdat): quoted is a list of double quoted strings, buff buffers a double quoted string, and sdat is the data string to be processed. Returns an update of (quoted, buff). Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
14 code for update_qstrings def update_qstrings(quoted, buff, sdat): ".." newb = buff for i in range(0, len(sdat)): if(newb == ): if(sdat[i] == \" ): newb = o # o is for opened else: if(sdat[i]!= \" ): newb += sdat[i] else: # do not store o quoted.append(newb[1:len(newb)]) newb = return (quoted, newb) Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
15 the function main() def main(): Prompts the user for a file name and scans the file for double quoted strings. print getting double quoted strings name = raw_input( Give file name : ) datafile = open(name, r ) qstr = quoted_strings(datafile) print qstr datafile.close() if name == " main ": main() Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
16 Web Clients and Crawlers 1 Web Clients alternatives to web browsers opening a web page and copying its content 2 Scanning Files looking for strings between double quotes parsing URLs for the server location 3 Web Crawlers making requests recursively incremental development, modular design of code 4 Parsing HTML the HTMLParser module overriding methods in HTMLParser Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
17 scanning web pages for URLs Recall the second example application: list all URLs referred to at def main(): Prompts the user for a web page, and prints all URLs this page refers to. print listing reachable locations page = raw_input( Give URL : ) links = http_links(page) print found %d HTTP links % len(links) show_locations(links) Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
18 filtering a double quoted string from scanquotes import update_qstrings def http_filter(names): Returns from the list names only those strings which begin with http. result = [] for name in names: if(len(name) > 4): if name[0:4] == http : result.append(name) return result Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
19 scanning an HTML file for HTTP strings def http_links(url): Given the URL for the web page, returns the list of all HTTP strings. import urllib page = urllib.urlopen(url) result = [] buff = while True: data = page.read(80) if data == : break (result, buff) = update_qstrings(result, \ buff, data) result = http_filter(result) page.close() return result Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
20 showing only server locations using the module urlparse An URL consists of 6 parts protocol://location/path:parameters?query#frag Given URL u, urlparse.urlparse(u) returns 6-tuple. def show_locations(locs): Shows the locations of the URL in locs. from urlparse import urlparse for location in locs: tupl = urlparse(location) print tupl[1] Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
21 Web Clients and Crawlers 1 Web Clients alternatives to web browsers opening a web page and copying its content 2 Scanning Files looking for strings between double quotes parsing URLs for the server location 3 Web Crawlers making requests recursively incremental development, modular design of code 4 Parsing HTML the HTMLParser module overriding methods in HTMLParser Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
22 web crawlers Scanning HTML files and browsing: 1 given a URL, open a web page, 2 compute the list of all URLs in the page, 3 for all URLs in the list do: 1 open the web page defined by location of URL, 2 compute the list of all URLs on that page. continue recursively, crawling the web Things to consider: 1 remove duplicates from list of URLs, 2 do not turn back to pages visited before, 3 limit the levels of recursion, 4 some links will not work. This is similar to finding a path in a maze. Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
23 Web Clients and Crawlers 1 Web Clients alternatives to web browsers opening a web page and copying its content 2 Scanning Files looking for strings between double quotes parsing URLs for the server location 3 Web Crawlers making requests recursively incremental development, modular design of code 4 Parsing HTML the HTMLParser module overriding methods in HTMLParser Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
24 modular design of web crawler # L-21 MCS 507 Mon 14 Oct 2013 : webcrawler.py Prompts the user for a URL and the maximal depth of the recursion tree. Lists all locations of web servers that can reached starting from the user given URL. from scanhttplinks import http_links Still left to write: 1 management of list of server locations, 2 recursive function to crawl the web. Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
25 retain only new locations def new_locations(urls, vis): Given the list urls of new URLs and the list of already visited locations in vis, returns the list of new locations, locations not yet visited earlier. from urlparse import urlparse result = [] for name in urls: tup = urlparse(name) loc = tup[1] if not loc in result: if not loc in vis: result.append(loc) return result Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
26 some links will not work... Add an exception handler: def http_links(url): Given the URL for the web page, returns the list of all HTTP strings. import urllib try: print opening + url +... page = urllib.urlopen(url) except IOError: print opening + url + failed return [] Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
27 unparsing URLs Recall that we only store the server locations. To open a web page we also need to specify the protocol. We apply urlparse.urlunparse >>> from urlparse import urlunparse >>> urlunparse(( http, )) We must provide a 6-tuple as argument... Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
28 running the crawler $ python webcrawler.py crawling the web... Give URL : give maximal depth : 2 opening opening opening opening opening opening opening opening opening it takes a while.. total #locations : 538 Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
29 the function main() def main(): Prompts the user for a web page, and prints all URLs this page refers to. print crawling the web... page = raw_input( Give URL : ) depth = input( give maximal depth : ) locs = crawler(page, depth, []) print reachable locations :, locs print total #locations :, len(locs) if name == " main ": main() Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
30 code for the crawler def crawler(url, kst, locs): Returns the list locs updated with the list of locations reachable from the given url using kst steps. from urlparse import urlunparse links = http_links(url) newlocs = new_locations(links, locs) result = locs + newlocs if(kst == 0): return result else: for loc in newlocs: name = urlunparse(( http, loc, \,,, )) result = crawler(name, kst-1, result) return result Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
31 Web Clients and Crawlers 1 Web Clients alternatives to web browsers opening a web page and copying its content 2 Scanning Files looking for strings between double quotes parsing URLs for the server location 3 Web Crawlers making requests recursively incremental development, modular design of code 4 Parsing HTML the HTMLParser module overriding methods in HTMLParser Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
32 the HTMLParser module to parse html code In the standard Python distribution: >>> import HTMLParser >>> help(htmlparser) the class HTMLParser allows to override handlers of tags provides a feed method the feed method handles buffering Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
33 using HTMLParser from HTMLParser import HTMLParser from urllib import urlopen class OurHTMLParser(HTMLParser): def init (self): def handle_starttag(self, tag, attrs): def handle_endtag(self, tag): def main(): page = urlopen(url) pars = OurHTMLParser() while True: data = page.read(80) if data == : break pars.feed(data) pars.close() Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
34 tallying the tags using the class HTMLParser Gather basic statistics about a page: 1 what types of tags are used, 2 count number of occurrences for each tag. At end of each tag the tally is updated. Data structure for the tally: dictionary. keys: string with type of tag values: natural number counts #occurrences The tally is an object data attribute. Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
35 Web Clients and Crawlers 1 Web Clients alternatives to web browsers opening a web page and copying its content 2 Scanning Files looking for strings between double quotes parsing URLs for the server location 3 Web Crawlers making requests recursively incremental development, modular design of code 4 Parsing HTML the HTMLParser module overriding methods in HTMLParser Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
36 the class Tagtally from HTMLParser import HTMLParser from urllib import urlopen class Tagtally(HTMLParser): Makes a tally of ending tags. def init (self): Initializes the dictionary of tags. def handle_endtag(self, tag): Maintains a tally of the tags. def show_tally(self): Prints the tally to screen. Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
37 the function main() def main(): Opens a web page and parses it. url = print opening %s... % url page = urlopen(url) tags = Tagtally() while True: data = page.read(80) if data == : break tags.feed(data) tags.close() print the tally of tags : tags.show_tally() Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
38 constructor and ShowTally If overriding init, we must initialize parent class. def init (self): Initializes the dictionary of tags. HTMLParser. init (self) self.tally = {} def show_tally(self): Prints the tally to screen. for each in self.tally: print each, :, self.tally[each] Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
39 update of the tally updating a dictionary First check if there is already a tag... def handle_endtag(self, tag): Maintains a tally of the tags. if self.tally.has_key(tag): self.tally[tag] \ = self.tally[tag] + 1 else: self.tally.update({tag:1}) If no tag present, update the dictionary. Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
40 Summary + Exercises Python scripts can read web pages as files. 1 Write a script to download all.py files from jan/mcs507/main.html 2 Limit the search of the crawler so that it only opens pages within the same domain. For example, if we start at a location ending with edu, we only open pages with locations ending with edu. 3 Adjust webcrawler.py to search for a path between two locations. The user is prompted for two URLs. Crawling stops if a path has been found. Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
41 more exercises 4 Use HTMLParser to write a shorter version of the web crawler in the script webcrawler.py. The fourth homework is due on Friday 18 October, 9AM: exercises 3 and 6 of Lecture 13; exercises 2 and 4 of Lecture 14; exercises 1 and 3 of Lecture 15; exercises 2 and 5 of Lecture 16; exercises 3 and 4 of Lecture 17. Scientific Software (MCS 507 L-21) web clients and crawlers 14 October / 41
Web Clients and Crawlers
Web Clients and Crawlers 1 Web Clients alternatives to web browsers opening a web page and copying its content 2 Scanning files looking for strings between double quotes parsing URLs for the server location
More informationWeb Interfaces. the web server Apache processing forms with Python scripts Python code to write HTML
Web Interfaces 1 Python Scripts in Browsers the web server Apache processing forms with Python scripts Python code to write HTML 2 Web Interfaces for the Determinant dynamic interactive forms passing data
More information20.5. urllib Open arbitrary resources by URL
1 of 9 01/25/2012 11:19 AM 20.5. urllib Open arbitrary resources by URL Note: The urllib module has been split into parts and renamed in Python 3.0 to urllib.request, urllib.parse, and urllib.error. The
More informationDefining Functions. turning expressions into functions. writing a function definition defining and using modules
Defining Functions 1 Lambda Functions turning expressions into functions 2 Functions and Modules writing a function definition defining and using modules 3 Computing Series Developments exploring an example
More informationcallback, iterators, and generators
callback, iterators, and generators 1 Adding a Callback Function a function for Newton s method a function of the user to process results 2 A Newton Iterator defining a counter class refactoring the Newton
More informationOutline. evolution of the web IP addresses and URLs client/server and HTTP. HTML, XML, MathML MathML generated by Maple. the weather forecast
Outline 1 Internet Basics evolution of the web IP addresses and URLs client/server and HTTP 2 Markup Languages HTML, XML, MathML MathML generated by Maple 3 Retrieving Data the weather forecast 4 CGI Programming
More informationAPT Session 2: Python
APT Session 2: Python Laurence Tratt Software Development Team 2017-10-20 1 / 17 http://soft-dev.org/ What to expect from this session: Python 1 What is Python? 2 Basic Python functionality. 2 / 17 http://soft-dev.org/
More informationRoot Finding Methods. sympy and Sage. MCS 507 Lecture 13 Mathematical, Statistical and Scientific Software Jan Verschelde, 21 September 2011
wrap Root Finding Methods 1 2 wrap MCS 507 Lecture 13 Mathematical, Statistical and Scientific Software Jan Verschelde, 21 September 2011 Root Finding Methods 1 wrap 2 wrap wrap octave-3.4.0:1> p = [1,0,2,-1]
More informationturning expressions into functions symbolic substitution, series, and lambdify
Defining Functions 1 Lambda Functions turning expressions into functions symbolic substitution, series, and lambdify 2 Functions and Modules writing a function definition defining and using modules where
More informationprocessing data with a database
processing data with a database 1 MySQL and MySQLdb MySQL: an open source database running MySQL for database creation MySQLdb: an interface to MySQL for Python 2 CTA Tables in MySQL files in GTFS feed
More informationLists and Loops. browse Python docs and interactive help
Lists and Loops 1 Help in Python browse Python docs and interactive help 2 Lists in Python defining lists lists as queues and stacks inserting and removing membership and ordering lists 3 Loops in Python
More informationIntroduction to Web Scraping with Python
Introduction to Web Scraping with Python NaLette Brodnax The Institute for Quantitative Social Science Harvard University January 26, 2018 workshop structure 1 2 3 4 intro get the review scrape tools Python
More informationOutline. the try-except statement the try-finally statement. exceptions are classes raising exceptions defining exceptions
Outline 1 Exception Handling the try-except statement the try-finally statement 2 Python s Exception Hierarchy exceptions are classes raising exceptions defining exceptions 3 Anytime Algorithms estimating
More informationAn Introduction to Web Scraping with Python and DataCamp
An Introduction to Web Scraping with Python and DataCamp Olga Scrivner, Research Scientist, CNS, CEWIT WIM, February 23, 2018 0 Objectives Materials: DataCamp.com Review: Importing files Accessing Web
More informationNumerical Integration
Numerical Integration 1 Functions using Functions functions as arguments of other functions the one-line if-else statement functions returning multiple values 2 Constructing Integration Rules with sympy
More informationIntroduction to APIs. Session 2, Oct. 25
Introduction to APIs Session 2, Oct. 25 API: Application Programming Interface What the heck does that mean?! Interface: allows a user to interact with a system Graphical User Interface (GUI): interact
More information20.6. urllib2 extensible library for opening URLs
1 of 16 01/25/2012 11:20 AM 20.6. urllib2 extensible library for opening URLs Note: The urllib2 module has been split across several modules in Python 3.0 named urllib.request and urllib.error. The 2to3
More informationdifferentiation techniques
differentiation techniques 1 Callable Objects delayed execution of stored code 2 Numerical and Symbolic Differentiation numerical approximations for the derivative storing common code in a parent class
More informationUser Interfaces. getting arguments of the command line a command line interface to store points fitting points with polyfit of numpy
User Interfaces 1 Command Line Interfaces getting arguments of the command line a command line interface to store points fitting points with polyfit of numpy 2 Encapsulation by Object Oriented Programming
More informationReview for Second Midterm Exam
Review for Second Midterm Exam 1 Policies & Material 2 Questions modular design working with files object-oriented programming testing, exceptions, complexity GUI design and implementation MCS 260 Lecture
More informationurllib2 extensible library for opening URLs
urllib2 extensible library for opening URLs Note: The urllib2 module has been split across several modules in Python 3.0 named urllib.request and urllib.error. The 2to3 tool will automatically adapt imports
More informationNetwork Programming in Python. What is Web Scraping? Server GET HTML
Network Programming in Python Charles Severance www.dr-chuck.com Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution 3.0 License. http://creativecommons.org/licenses/by/3.0/.
More informationUser Interfaces. MCS 507 Lecture 11 Mathematical, Statistical and Scientific Software Jan Verschelde, 16 September Command Line Interfaces
User 1 2 MCS 507 Lecture 11 Mathematical, Statistical and Scientific Software Jan Verschelde, 16 September 2011 User 1 2 command line interfaces Many programs run without dialogue with user, as $ executable
More informationPemrograman Jaringan Web Client Access PTIIK
Pemrograman Jaringan Web Client Access PTIIK - 2012 In This Chapter You'll learn how to : Download web pages Authenticate to a remote HTTP server Submit form data Handle errors Communicate with protocols
More informationoperator overloading algorithmic differentiation the class DifferentialNumber operator overloading
operator overloading 1 Computing with Differential Numbers algorithmic differentiation the class DifferentialNumber operator overloading 2 Computing with Double Doubles the class DoubleDouble defining
More informationIntroduction to programming using Python
Introduction to programming using Python Matthieu Choplin matthieu.choplin@city.ac.uk http://moodle.city.ac.uk/ Session 9 1 Objectives Quick review of what HTML is The find() string method Regular expressions
More informationCSI 3140 WWW Structures, Techniques and Standards. Markup Languages: XHTML 1.0
CSI 3140 WWW Structures, Techniques and Standards Markup Languages: XHTML 1.0 HTML Hello World! Document Type Declaration Document Instance Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson
More informationprocessing data from the web
processing data from the web 1 CTA Tables general transit feed specification stop identification and name finding trips for a given stop 2 CTA Tables in MySQL files in GTFS feed are tables in database
More informationTesting Software with Pexpect
Testing Software with Pexpect 1 Testing Computer Algebra Systems testing software preparing the test suite replacing print with assert statements 2 Automating Tests with Pexpect testing SymPy in Sage with
More informationTuples and Nested Lists
stored in hash 1 2 stored in hash 3 and 4 MCS 507 Lecture 6 Mathematical, Statistical and Scientific Software Jan Verschelde, 2 September 2011 and stored in hash 1 2 stored in hash 3 4 stored in hash tuples
More informationCHAPTER 2 MARKUP LANGUAGES: XHTML 1.0
WEB TECHNOLOGIES A COMPUTER SCIENCE PERSPECTIVE CHAPTER 2 MARKUP LANGUAGES: XHTML 1.0 Modified by Ahmed Sallam Based on original slides by Jeffrey C. Jackson reserved. 0-13-185603-0 HTML HELLO WORLD! Document
More informationRandom Walks & Cellular Automata
Random Walks & Cellular Automata 1 Particle Movements basic version of the simulation vectorized implementation 2 Cellular Automata pictures of matrices an animation of matrix plots the game of life of
More informationAnnouncements for this Lecture
Lecture 6 Objects Announcements for this Lecture Last Call Quiz: About the Course Take it by tomorrow Also remember survey Assignment 1 Assignment 1 is live Posted on web page Due Thur, Sep. 18 th Due
More informationXML is an overwhelmingly popular data exchange format, because it s human-readable and easily digested by software.
Handout 37 May 30, 2008 XML is an overwhelmingly popular data exchange format, because it s human-readable and easily digested by software. Python has excellent support for XML, as it provides both SAX
More informationCIT 590 Homework 5 HTML Resumes
CIT 590 Homework 5 HTML Resumes Purposes of this assignment Reading from and writing to files Scraping information from a text file Basic HTML usage General problem specification A website is made up of
More informationLists and Loops. defining lists lists as queues and stacks inserting and removing membership and ordering lists
Lists and Loops 1 Lists in Python defining lists lists as queues and stacks inserting and removing membership and ordering lists 2 Loops in Python for and while loops the composite trapezoidal rule MCS
More informationPython review. 1 Python basics. References. CS 234 Naomi Nishimura
Python review CS 234 Naomi Nishimura The sections below indicate Python material, the degree to which it will be used in the course, and various resources you can use to review the material. You are not
More informationCGI Architecture Diagram. Web browser takes response from web server and displays either the received file or error message.
What is CGI? The Common Gateway Interface (CGI) is a set of standards that define how information is exchanged between the web server and a custom script. is a standard for external gateway programs to
More informationOutline. gzip and gunzip data compression archiving files and pipes in Unix. format conversions encrypting text
Outline 1 Compressing Files gzip and gunzip data compression archiving files and pipes in Unix 2 File Methods in Python format conversions encrypting text 3 Using Buffers counting and replacing words using
More informationPython While Loop. if statement and functions. Python Functions. #!/usr/bin/python
Python While Loop slide 2 n=12 i=1 while i
More informationSearching and Ranking
Searching and Ranking Michal Cap May 14, 2008 Introduction Outline Outline Search Engines 1 Crawling Crawler Creating the Index 2 Searching Querying 3 Ranking Content-based Ranking Inbound Links PageRank
More informationWeb Site Development with HTML/JavaScrip
Hands-On Web Site Development with HTML/JavaScrip Course Description This Hands-On Web programming course provides a thorough introduction to implementing a full-featured Web site on the Internet or corporate
More informationHOWTO Fetch Internet Resources Using urllib2 Release 2.7.9
HOWTO Fetch Internet Resources Using urllib2 Release 2.7.9 Guido van Rossum and the Python development team Contents December 10, 2014 Python Software Foundation Email: docs@python.org 1 Introduction 2
More informationWeb Interfaces for Database Servers
Web Interfaces for Database Servers 1 CGI, MySQLdb, and Sockets glueing the connections with Python functions of the server: connect, count, and main development of the code for the client 2 Displaying
More informationAdvanced Web Programming
Advanced Web Programming 1 Advanced Web Programming what we have covered so far 2 The SocketServer Module simplified development of network servers a server tells clients the time 3 A Forking Server instead
More informationIntroduction to programming using Python
Introduction to programming using Python Matthieu Choplin matthieu.choplin@city.ac.uk http://moodle.city.ac.uk/ Session 6-2 1 Objectives To open a file, read/write data from/to a file To use file dialogs
More informationCS Programming Languages: Python
CS 3101-1 - Programming Languages: Python Lecture 5: Exceptions / Daniel Bauer (bauer@cs.columbia.edu) October 08 2014 Daniel Bauer CS3101-1 Python - 05 - Exceptions / 1/35 Contents Exceptions Daniel Bauer
More informationLESSON LEARNING TARGETS
DAY 3 Frames LESSON LEARNING TARGETS I can describe the attributes of the tag. I can write code to create frames for displaying Web pages with headings, menus, and other content using the
More informationJohn Romero Programming Proverbs
John Romero Programming Proverbs slide 1 2. It s incredibly important that your game can always be run by your team. Bulletproof your engine by providing defaults (for input data) upon load failure. John
More informationVebra Search Integration Guide
Guide Introduction... 2 Requirements... 2 How a Vebra search is added to your site... 2 Integration Guide... 3 HTML Wrappers... 4 Page HEAD Content... 4 CSS Styling... 4 BODY tag CSS... 5 DIV#s-container
More informationMultithreaded Servers
Multithreaded Servers 1 Serving Multiple Clients avoid to block clients with waiting using sockets and threads 2 Waiting for Data from 3 Clients running a simple multithreaded server code for client and
More informationGraphical User Interfaces
Graphical User Interfaces 1 User Interfaces GUIs in Python with Tkinter object oriented GUI programming 2 Mixing Colors specification of the GUI the widget Scale 3 Simulating a Bouncing Ball layout of
More informationOutline. tallying the votes global and local variables call by value or call by reference. of variable length using keywords for optional arguments
Outline 1 Histograms tallying the votes global and local variables call by value or call by reference 2 Arguments of Functions of variable length using keywords for optional arguments 3 Functions using
More informationSTAXDoc User's Guide. Contents. Introduction. Requirements. STAX Documentation Generator (STAXDoc) User's Guide Version
STAXDoc User's Guide STAX Documentation Generator (STAXDoc) User's Guide Version 1.0.4 February 26, 2008 Contents Introduction Requirements Syntax Source Files STAX Xml Files Package Comment Files Overview
More informationHTML Processing in Python. 3. Construction of the HTML parse tree for further use and traversal.
.. DATA 301 Introduction to Data Science Alexander Dekhtyar.. HTML Processing in Python Overview HTML processing in Python consists of three steps: 1. Download of the HTML file from the World Wide Web.
More informationData Abstraction. UW CSE 160 Spring 2015
Data Abstraction UW CSE 160 Spring 2015 1 What is a program? What is a program? A sequence of instructions to achieve some particular purpose What is a library? A collection of functions that are helpful
More informationAdvanced CGI Scripts. personalized web browsing using cookies: count number of visits. secure hash algorithm using cookies for login data the scripts
Advanced CGI Scripts 1 Cookies personalized web browsing using cookies: count number of visits 2 Password Encryption secure hash algorithm using cookies for login data the scripts 3 Authentication via
More informationBranching and Enumeration
Branching and Enumeration 1 Booleans and Branching computing logical expressions computing truth tables with Sage if, else, and elif 2 Timing Python Code try-except costs more than if-else 3 Recursive
More informationWWW. HTTP, Ajax, APIs, REST
WWW HTTP, Ajax, APIs, REST HTTP Hypertext Transfer Protocol Request Web Client HTTP Server WSGI Response Connectionless Media Independent Stateless Python Web Application WSGI : Web Server Gateway Interface
More informationWeb scraping and social media scraping crawling
Web scraping and social media scraping crawling Jacek Lewkowicz, Dorota Celińska University of Warsaw March 21, 2018 What will we be working on today? We should already known how to gather data from a
More informationFile I/O, Benford s Law, and sets
File I/O, Benford s Law, and sets Matt Valeriote 11 February 2019 Benford s law Benford s law describes the (surprising) distribution of first digits of many different sets of numbers. Read it about it
More informationRandom Walks & Cellular Automata
Random Walks & Cellular Automata 1 Particle Movements basic version of the simulation vectorized implementation 2 Cellular Automata pictures of matrices an animation of matrix plots the game of life of
More informationProblem Set 4 Due: Start of class, October 5
CS242 Computer Networks Handout # 8 Randy Shull September 28, 2017 Wellesley College Problem Set 4 Due: Start of class, October 5 Reading: Kurose & Ross, Sections 2.5, 2.6, 2.7, 2.8 Problem 1 [4]: Free-riding
More informationget set up for today s workshop
get set up for today s workshop Please open the following in Firefox: 1. Poll: bit.ly/iuwim25 Take a brief poll before we get started 2. Python: www.pythonanywhere.com Create a free account Click on Account
More informationc122jan2714.notebook January 27, 2014
Internet Developer 1 Start here! 2 3 Right click on screen and select View page source if you are in Firefox tells the browser you are using html. Next we have the tag and at the
More informationHOWTO Fetch Internet Resources Using The urllib Package Release 3.5.1
HOWTO Fetch Internet Resources Using The urllib Package Release 3.5.1 Guido van Rossum and the Python development team February 24, 2016 Python Software Foundation Email: docs@python.org Contents 1 Introduction
More informationGraphical User Interfaces
to visualize Graphical User Interfaces 1 2 to visualize MCS 507 Lecture 12 Mathematical, Statistical and Scientific Software Jan Verschelde, 19 September 2011 Graphical User Interfaces to visualize 1 2
More informationSeshat Documentation. Release Joshua P Ashby
Seshat Documentation Release 1.0.0 Joshua P Ashby Apr 05, 2017 Contents 1 A Few Minor Warnings 3 2 Quick Start 5 2.1 Contributing............................................... 5 2.2 Doc Contents...............................................
More informationPYTHON CONTENT NOTE: Almost every task is explained with an example
PYTHON CONTENT NOTE: Almost every task is explained with an example Introduction: 1. What is a script and program? 2. Difference between scripting and programming languages? 3. What is Python? 4. Characteristics
More informationWeb Scraping. HTTP and Requests
1 Web Scraping Lab Objective: Web Scraping is the process of gathering data from websites on the internet. Since almost everything rendered by an internet browser as a web page uses HTML, the rst step
More informationmaking connections general transit feed specification stop names and stop times storing the connections in a dictionary
making connections 1 CTA Tables general transit feed specification stop names and stop times storing the connections in a dictionary 2 CTA Schedules finding connections between stops sparse matrices in
More informationOutline. general information policies for the final exam
Outline 1 final exam on Tuesday 5 May 2015, at 8AM, in BSB 337 general information policies for the final exam 2 some example questions strings, lists, dictionaries scope of variables in functions working
More informationHigh Level Parallel Processing
High Level Parallel Processing 1 GPU computing with Maple enabling CUDA in Maple 15 stochastic processes and Markov chains 2 Multiprocessing in Python scripting in computational science the multiprocessing
More informationThis course is designed for web developers that want to learn HTML5, CSS3, JavaScript and jquery.
HTML5/CSS3/JavaScript Programming Course Summary Description This class is designed for students that have experience with basic HTML concepts that wish to learn about HTML Version 5, Cascading Style Sheets
More informationIntermediate Python 3.x
Intermediate Python 3.x This 4 day course picks up where Introduction to Python 3 leaves off, covering some topics in more detail, and adding many new ones, with a focus on enterprise development. This
More informationexamples from first year calculus (continued), file I/O, Benford s Law
examples from first year calculus (continued), file I/O, Benford s Law Matt Valeriote 5 February 2018 Grid and Bisection methods to find a root Assume that f (x) is a continuous function on the real numbers.
More informationCSCE 463/612: Networks and Distributed Processing Homework 1 Part 2 (25 pts) Due date: 1/29/19
CSCE 463/612: Networks and Distributed Processing Homework 1 Part 2 (25 pts) Due date: 1/29/19 1. Problem Description You will now expand into downloading multiple URLs from an input file using a single
More informationHomework. Announcements. Before We Start. Languages. Plan for today. Chomsky Normal Form. Final Exam Dates have been announced
Homework Homework #3 returned Homework #4 due today Homework #5 Pg 169 -- Exercise 4 Pg 183 -- Exercise 4c,e,i Pg 184 -- Exercise 10 Pg 184 -- Exercise 12 Pg 185 -- Exercise 17 Due 10 / 17 Announcements
More information1 A file-based desktop search engine: FDSEE
A. V. Gerbessiotis CS 345 Fall 2015 HW 5 Sep 8, 2015 66 points CS 345: Homework 5 (Due: Nov 09, 2015) Rules. Teams of no more than three students as explained in syllabus (Handout 1). This is to be handed
More informationCME 193: Introduction to Scientific Python Lecture 4: Strings and File I/O
CME 193: Introduction to Scientific Python Lecture 4: Strings and File I/O Nolan Skochdopole stanford.edu/class/cme193 4: Strings and File I/O 4-1 Contents Strings File I/O Classes Exercises 4: Strings
More informationLecture 5. Defining Functions
Lecture 5 Defining Functions Announcements for this Lecture Last Call Quiz: About the Course Take it by tomorrow Also remember the survey Readings Sections 3.5 3.3 today Also 6.-6.4 See online readings
More informationfrom Recursion to Iteration
from Recursion to Iteration 1 Quicksort Revisited using arrays partitioning arrays via scan and swap recursive quicksort on arrays 2 converting recursion into iteration an iterative version with a stack
More informationIntroduction to Python
Introduction to Python NaLette Brodnax The Institute for Quantitative Social Science Harvard University January 24, 2018 what we ll cover first 0 Python installation one-minute poll 1 introduction what
More informationWeek 8: HyperText Transfer Protocol - Clients - HTML. Johan Bollen Old Dominion University Department of Computer Science
Week 8: HyperText Transfer Protocol - Clients - HTML Johan Bollen Old Dominion University Department of Computer Science jbollen@cs.odu.edu http://www.cs.odu.edu/ jbollen October 23, 2003 Page 1 MIDTERM
More informationPTN-202: Advanced Python Programming Course Description. Course Outline
PTN-202: Advanced Python Programming Course Description This 4-day course picks up where Python I leaves off, covering some topics in more detail, and adding many new ones, with a focus on enterprise development.
More informationCS61A Lecture 15 Object Oriented Programming, Mutable Data Structures. Jom Magrotker UC Berkeley EECS July 12, 2012
CS61A Lecture 15 Object Oriented Programming, Mutable Data Structures Jom Magrotker UC Berkeley EECS July 12, 2012 COMPUTER SCIENCE IN THE NEWS http://www.iospress.nl/ios_news/music to my eyes device converting
More informationIntroduction to MATLAB
Introduction to MATLAB The Desktop When you start MATLAB, the desktop appears, containing tools (graphical user interfaces) for managing files, variables, and applications associated with MATLAB. The following
More informationCS1 Lecture 2 Jan. 16, 2019
CS1 Lecture 2 Jan. 16, 2019 Contacting me/tas by email You may send questions/comments to me/tas by email. For discussion section issues, sent to TA and me For homework or other issues send to me (your
More informationScraping I: Introduction to BeautifulSoup
5 Web Scraping I: Introduction to BeautifulSoup Lab Objective: Web Scraping is the process of gathering data from websites on the internet. Since almost everything rendered by an internet browser as a
More informationSkills you will learn: How to make requests to multiple URLs using For loops and by altering the URL
Chapter 9 Your First Multi-Page Scrape Skills you will learn: How to make requests to multiple URLs using For loops and by altering the URL In this tutorial, we will pick up from the detailed example from
More informationLecture 4: Data Collection and Munging
Lecture 4: Data Collection and Munging Instructor: Outline 1 Data Collection and Scraping 2 Web Scraping basics In-Class Quizzes URL: http://m.socrative.com/ Room Name: 4f2bb99e Data Collection What you
More informationCustom Actions for argparse Documentation
Custom Actions for argparse Documentation Release 0.4 Hai Vu October 26, 2015 Contents 1 Introduction 1 2 Information 3 2.1 Folder Actions.............................................. 3 2.2 IP Actions................................................
More informationBabu Madhav Institute of Information Technology, UTU 2015
Five years Integrated M.Sc.(IT)(Semester 5) Question Bank 060010502:Programming in Python Unit-1:Introduction To Python Q-1 Answer the following Questions in short. 1. Which operator is used for slicing?
More informationWeek - 01 Lecture - 04 Downloading and installing Python
Programming, Data Structures and Algorithms in Python Prof. Madhavan Mukund Department of Computer Science and Engineering Indian Institute of Technology, Madras Week - 01 Lecture - 04 Downloading and
More informationSpring 2008 June 2, 2008 Section Solution: Python
CS107 Handout 39S Spring 2008 June 2, 2008 Section Solution: Python Solution 1: Jane Austen s Favorite Word Project Gutenberg is an open-source effort intended to legally distribute electronic copies of
More informationpicrawler Documentation
picrawler Documentation Release 0.1.1 Ikuya Yamada October 07, 2013 CONTENTS 1 Installation 3 2 Getting Started 5 2.1 PiCloud Setup.............................................. 5 2.2 Basic Usage...............................................
More informationLarge-Scale Networks
Large-Scale Networks 3b Python for large-scale networks Dr Vincent Gramoli Senior lecturer School of Information Technologies The University of Sydney Page 1 Introduction Why Python? What to do with Python?
More informationNotes General. IS 651: Distributed Systems 1
Notes General Discussion 1 and homework 1 are now graded. Grading is final one week after the deadline. Contract me before that if you find problem and want regrading. Minor syllabus change Moved chapter
More informationIntroduction to Python
Introduction to Python Version 1.1.5 (12/29/2008) [CG] Page 1 of 243 Introduction...6 About Python...7 The Python Interpreter...9 Exercises...11 Python Compilation...12 Python Scripts in Linux/Unix & Windows...14
More information(b) If a heap has n elements, what s the height of the tree?
CISC 5835 Algorithms for Big Data Fall, 2018 Homework Assignment #4 1 Answer the following questions about binary tree and heap: (a) For a complete binary tree of height 4 (we consider a tree with just
More information