Introduction to Web Mining for Social Scientists Lecture 4: Web Scraping Workshop Prof. Dr. Ulrich Matter (University of St. Gallen) 10/10/2018

Size: px
Start display at page:

Download "Introduction to Web Mining for Social Scientists Lecture 4: Web Scraping Workshop Prof. Dr. Ulrich Matter (University of St. Gallen) 10/10/2018"

Transcription

1 Introduction to Web Mining for Social Scientists Lecture 4: Web Scraping Workshop Prof. Dr. Ulrich Matter (University of St. Gallen) 10/10/ First Steps in R: Part II In the previous week we looked at the very basics of using R: how to initiate a variable, R as a calculator, data structures, functions, etc. All of this was rather focused on executing command after command or a number of commands at once in an interactive R session. Apart from the definition of a function, we haven t really looked at how to program with R. A large part of basic programming has to do with automating the execution of a number of commands conditional on some control statements. That is, we want to tell the computer to do something until a certain goal is reached. In the simplest case this boils down to a control flow statement that specifies an iteration, a so-called loop. 1.1 Loops A loop is typically a sequence of statements that is executed a specific number of times. How often the code inside the loop is executed depends on a (hopefully) clearly defined control statement. If we know in advance how often the code inside of the loop has to be executed, we typically write a so-called for-loop. If the number of iterations is not clearly known before executing the code, we typically write a so-called while-loop. The following subsections illustrate both of these concepts in R For-loops In simple terms, a for-loop tells the computer to execute a sequence of commands for each case in a set of n cases. For example, a for-loop could be used to sum up each of the elements in a numeric vector of fix length (thus the number of iterations is clearly defined). In plain English, the for-loop would state something like: Start with 0 as the current total value, for each of the elements in the vector add the value of this element to the current total value. Note how this logically implies that the loop will stop once the value of the last element in the vector is added to the total. Let s illustrate this in R. Take the numeric vector c(1,2,3,4,5). A for loop to sum up all elements can be implemented as follows: vector to be summed up numbers <- c(1,2,3,4,5) initiate total total_sum <- 0 number of iterations n <- length(numbers) start loop for (i in 1:n) { total_sum <- total_sum + numbers[i] check result total_sum [1] 15 1

2 compare with result of sum() function sum(numbers) [1] 15 In some situations a simple for-loop might not be sufficient. Within one sequence of commands there might be another sequence of commands that also has to be executed for a number of times each time the first sequence of commands is executed. In such a case we speak of a nested for-loop. We can illustrate this easily by extending the example of the numeric vector above to a matrix for which we want to sum up the values in each column. Building on the loop implemented above, we would say for each column j of a given numeric matrix, execute the for-loop defined above. matrix to be summed up numbers_matrix <- matrix(1:20, ncol = 4) numbers_matrix [,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,] [5,] number of iterations for outer loop m <- ncol(numbers_matrix) number of iterations for inner loop n <- nrow(numbers_matrix) start outer loop (loop over columns of matrix) for (j in 1:m) { start inner loop initiate total total_sum <- 0 for (i in 1:n) { total_sum <- total_sum + numbers_matrix[i, j] print(total_sum) [1] 15 [1] 40 [1] 65 [1] While-loop In a situation where a program has to repeatedly run a sequence of commands but we don t know in advance how many iterations we need in order to reach the intended goal, a while-loop can help. In simple terms, a while loop keeps executing a sequence of commands as long as a certain logical statement is true. The flow chart in Figure 1 illustrates this point. For example, a while-loop in plain English could state something like start with 0 as the total, add 1.12 to the total until the total is larger than 20. We can implement this in R as follows. initiate starting value total <- 0 start loop 2

3 Figure 1: While-loop illustration. Source: While-loop-diagram.svg. while (total <= 20) { total <- total check the result total [1] Loops and Web Scraping The two types of loops are very helpful in many web scraping tasks. Note how the web scraping example of last week ( blueprint ) is only designed to run for one specific Amazon product review (based on the product id). We can easily imagine to extend the scraper to gather more data. For example, we could first collect a bunch of product ids for which we want to collect all reviews. Thus, we could implement this with a for-loop that iterates through each of the product ids and stops once all of the product ids have been used. Alternatively, we could imagine an extension of the basic review scraper that would first scrape all the reviews of one product id and then continue to scrape all reviews of all the products that the reviewer of the initial review also reviewed, and so on until we have collected a certain number of reviews (or collected reviews of a certain number of reviewers, etc.). The following extended examples show the practical use of loops in different web scraping contexts. 2 Web Scraping in Action 2.1 Extracting Voting Tables from the U.S. Senate A simple but very practical web scraping task is to extract data from HTML tables on a website. If we have to do this only once, R might not even be necessary but we might get the data simply by marking the table in a web browser and copy-pasting it into a spreadsheet program such as Excel (and saving it as CSV etc.). However, it is likely the case that we have to repeatedly extract various tables from the same website. The following exercise shows how this can be done in the context of data on roll-call voting in the U.S. Senate. The scraper is made to extract all roll call voting results for a given list of congresses, and combine 3

4 them in one table. The data will be automatically extracted from the official website of the U.S. Senate where all data for the last few congresses are available on pages per session and congress. For example, the URL is pointing to the page providing the data for the first session of the 113th U.S. Congress. First, we inspect the source code with developer tools and figure out how the URLs are constructed. Based on this, we define the header section of a new R script for this scraper. As we want to extract data on voting results from various congresses and sessions, we define the fixed variables CONGRESS and SESSION as vectors. Introduction to Web Data Mining Lecture 4: Roll Call Data Scraper (HTML Tables) This is a basic web scraper to automatically extract data on roll call vote outcomes in the U.S. Senate. The data is extracted directly form the official government website See for an example of the type of page to be scraped. U. Matter, October 2017 PREAMBLE load packages library(httr) library(xml2) library(rvest) initiate fix variables BASE_URL <- " CONGRESS <- c(110:114) SESSION <- c(1, 2) Following the blueprint outlined in the previous week, we write the three components of the scraper. However, in this case we we aim to place all the components in a for-loop in order to iterate through all the pages we want to extract the tables with voting results from. The three components of our web scraper will thus form the body of the for-loop. That is, they build the sequence of commands that are executed sequentially until we have all the data we want to collect. From inspecting the website of the U.S. Senate (see we learn that in order to collect all the roll-call data from the 110th to the 114th congress, we have to iterate not only through each congress but also through each of the two sessions in one congress (each congress consists of two sessions). Thus for each congress and each session per congress, we want to extract the voting data. This implies a nested for-loop: in the outer loop we iterate through individual congresses, in the inner loop (that is, given a specific congress), we iterate through the sessions. Another key aspect to know before getting started is to understand what the result of each iteration is and how we collect/ merge the individual results. As the overall goal of the scraper is to extract data from HTML tables, a reasonable format to store the data of each iteration is a data.frame. Thus, each iteration will result in a data.frame, which implies that we have to store each of these data-frames while running the loop. We can do this with a list. Before starting the loop, we initiate an empty list all_tables <- list(null). Then, within the loop, we add each of the extracted tables (now objects of class data.frame) as an additional element of that list. The following code chunk contains the blueprint for the loop following this strategy (without the actual loop body, i.e., the three scraper components). initiate variables for iteration n_congr <- length(congress) 4

5 n_session <- length(session) all_tables <- list(null) start iteration for (i in 1:n_congr) { for (j in 1:n_session) { ADD COMPONENTS I TO III HERE! add resulting table to list rc_table_list <- list(rc_table) all_tables <- c(all_tables, rc_table_list) Note that in order to add an extracted table (here: a data-frame called rc_table) to the list, we first have to put it in a list rc_table_list <- list(rc_table) and then add it to the list containing all tables: all_tables <- c(all_tables, rc_table_list). The code above does not do anything yet on its own. We have to fill in the three components containing the actual scraping tasks in the body of the loop. When developing each of the components it is helpful to just write them for one iteration (ignoring the loop for a moment). This way we can test each component step by step before iterating over it many times. A simple way to do this is to just manually assign values to the index-variables i and j: i <- 1, j <- 1. The first component (interaction with the server, parse response... ) is then straightforwardly implemented and tested as I) Handle URL, HTTP request and response, parse HTML build the URL page <- paste0("vote_menu_", CONGRESS[i], "_", SESSION[j], rc_url <- paste0(base_url, page) request webpage, parse results rc_resp <- GET(rc_url) rc_html <- read_html(rc_resp) ".htm") As usual, we have to figure out (with the help of developer tools) how to extract the specific part of the HTML-document which contains the data of interest. In this particular case the xpath expression ".//*[@id='secondary_col2']/table" provides the result we are looking for in the second component: II) Extract the data of interest extract the table rc_table_node <- html_node(rc_html, xpath = ".//*[@id='secondary_col2']/table") rc_table <- html_table(rc_table_node) Finally in the last component, we prepare the extracted data for further processing. When looking at the result of the previous component (head(rc_table)), we note that the extracted table does not actually contain information about which congress and session it is from. We add this information by adding two new columns. III) Format and save data for further processing add additional variables rc_table$congress <- CONGRESS[i] rc_table$session <- SESSION[j] With this we have the extracted data from one iteration (one congress-session pair) in the form we want. Once we have tested each of the components and are happy with the overall result for one iteration, we can add them to the body of the loop and put all parts together. 5

6 Introduction to Web Mining Lecture 4: Roll Call Data Scraper (HTML Tables) This is a basic web scraper to automatically extract data on roll call vote outcomes in the U.S. Senate. The data is extracted directly form the official government website See for an example of the type of page to be scraped. U. Matter, October 2017 PREAMBLE load packages library(httr) library(xml2) library(rvest) initiate fix variables BASE_URL <- " CONGRESS <- c(110:114) SESSION <- c(1, 2) SCRAPER initiate variables for iteration n_congr <- length(congress) n_session <- length(session) all_tables <- list(null) start iteration for (i in 1:n_congr) { for (j in 1:n_session) { I) Handle URL, HTTP request and response, parse HTML build the URL page <- paste0("vote_menu_", CONGRESS[i], "_", SESSION[j], rc_url <- paste0(base_url, page) request webpage, parse results rc_resp <- GET(rc_url) rc_html <- read_html(rc_resp) ".htm") II) Extract the data of interest extract the table rc_table_node <- html_node(rc_html, xpath = ".//*[@id='secondary_col2']/table") alternatively: html_node(rc_html, css = "table") rc_table <- html_table(rc_table_node) III) Format and save data for further processing add additional variables rc_table$congress <- CONGRESS[i] rc_table$session <- SESSION[j] 6

7 add resulting table to list rc_table_list <- list(rc_table) all_tables <- c(all_tables, rc_table_list) As a last step, once the loop has finished, we can stack the individual data-frames together to get one large data-frame which we then can store locally as a csv-file to further work with the collected data. combine all tables in one: big_table <- do.call("rbind", all_tables) write result to file write.csv(x = big_table, file = "data/3_senate_rc.csv", row.names = FALSE) The first rows and columns of the resulting csv-file: Vote (Tally) Result 442 (93-0) Confirmed 441 (76-17) Agreed to 440 (48-46) Rejected 439 (70-25) Agreed to 438 (50-45) Rejected 437 (24-71) Rejected 2.2 A Simple Text Scraper for Wikipedia In this exercise we write an R script that looks up a bunch of terms in Wikipedia, parses the search results, extracts the text of the found page, and saves it locally as a text file. As usual, we first inspect the website with developer tools and have a close look at the part of the website containing the search field. We recognize that the HTML form s action attribute is indicating a relative link /w/index.php. This tells us that once a user hits enter to submit what she entered in the form, the search term will be further processed by a PHP script on Wikipedia s server. From this, however, we do not know yet, how the data will be submitted, or in other words, how we do have to formulate either a GET or POST request in order to mimic a user typing requests into the search field. In order to understand how the search function on Wikipedia pages works under the hood, we open the Network panel in the Firefox Developer Tools, and switch the HTML filter on (as we are only interested in the traffic related to HTML documents). We then type Donald Trump in the search field of the Wikipedia page and hit enter. The first entry of the network panel shows us the first transfer recorded after we hit enter. It tells us that the search function works as such that a GET request with an URL pointing to the PHP-script discovered above is sent to the server. We can copy the exact URL of the GET request by left-clicking on it in the network panel and select Copy/Copy URL and then verify that this is actually how the Wikipedia search function works by pasting the copied URL ( back into the Firefox address bar and hit enter. We can then test whether correctly understand how the URL for a query needs to be constructed by replacing the Donald+Trump part with Barack+Obama and see what 7

8 we get. Based on our insights about how the search field on Wikipedia works, we can start implementing our scraper. In the documentation of this script it is helpful to point out that there are two important types of URLs to be considered here: one as an example of a page to scrape data from, and one pointing to the search function. Since different parts of the URL to Wikipedia s search function will become handy, we define the parsed URL from our Donald Trump example as a fix variable. The aim of the scraper is to extract the text of the returned search result (the found Wikipedia entry) and store it locally in a text file. Therefore, we already define an output directory (RESULTS_DIR <- "data/wikipedia") where results should be stored. Introduction to Web Mining Lecture 4: Wikipedia Search Form Scraper This is a basic web scraper to automatically look up search terms in Wikipedia and extract the text of the returned page. See for an example of the type of page to be scraped. See for the type of URL used by Wikipedia's search function U. Matter, October 2017 PREAMBLE load packages library(httr) library(xml2) library(rvest) library(stringi) initiate fix variables SEARCH_URL <- parse_url(" SEARCH_TERM <- "Barak Obama" RESULTS_DIR <- "data/wikipedia/" As we have parsed the rather complex URL to perform searches on Wikipedia from the example above, we can simply modify the resulting object by replacing the respective parameter (search): SEARCH_URL$query$search <- SEARCH_TERM and then use the function build_url() to construct the URL for an individual request. The rest of the first component is straightforward from the blueprint. I) URL, HANDLE HTTP REQUEST AND THE RESPONSE ---- Build the URL (update search term) SEARCH_URL$query$search <- SEARCH_TERM fetch the website via a HTTP GET request URL <- build_url(search_url) search_result <- GET(URL) parse the content of the response (the html code) search_result_html <- read_html(search_result) or, alternatively: body <- content(resp) In the second component, we first identify the part of the parsed HTML document that we want to extract. In the case of how Wikipedia pages are currently built, it turns out that a straightforward way to do this is to select all paragraphs (<p>) that are embedded in a <div>-tag of class mw-parser-output. The xpath expression ".//*[@class='mw-parser-output']/p" captures thus all the HTML elements with content of 8

9 interest. In order to extract the text from those elements we simply apply the html_text()-function. II) filter HTML, extract data ---- content_nodes <- html_nodes(search_result_html, xpath = ".//*[@class='mw-parser-output']/p") content_text <- html_text(content_nodes) Finally, in the last component we define the name of the text-file to which we want to save the extracted text. 1 III) write text to file ---- filepath <- paste0(results_dir, stri_replace_all_fixed(str = SEARCH_TERM, " ", ""), ".txt" ) write(content_text, filepath) Putting all parts together, we can start using this script to automate the extraction of text from Wikipedia for any search term. Given the previous exercise, it should be straightforward to tweak this script in order to extract text from various pages based on a number of search terms (via a loop). Introduction to Web Mining Lecture 4: Wikipedia Search Form Scraper This is a basic web scraper to automatically look up search terms in Wikipedia and extract the text of the returned page. See for an example of the type of page to be scraped. See for the type of URL used by Wikipedia's search function U. Matter, October 2017 PREAMBLE load packages library(httr) library(xml2) library(rvest) library(stringi) initiate fix variables SEARCH_URL <- parse_url(" SEARCH_TERM <- "Barak Obama" RESULTS_DIR <- "data/wikipedia/" I) URL, HANDLE HTTP REQUEST AND THE RESPONSE ---- Build the URL (update search term) SEARCH_URL$query$search <- SEARCH_TERM fetch the website via a HTTP GET request URL <- build_url(search_url) search_result <- GET(URL) parse the content of the response (the html code) search_result_html <- read_html(search_result) or, alternatively: body <- content(resp) 1 The function stri_replace_all_fixed() is used here to automatically remove all the white space from the search term. Thus, in the case of a search with the term Donald Trump, the extracted data would be stored in a text-file with the path data/wikipedia/donaldtrump.txt. 9

10 II) filter HTML, extract data ---- content_nodes <- html_nodes(search_result_html, xpath = ".//*[@class='mw-parser-output']/p") content_text <- html_text(content_nodes) III) write text to file ---- filepath <- paste0(results_dir, stri_replace_all_fixed(str = SEARCH_TERM, " ", ""), ".txt" ) write(content_text, filepath) 3 References 10

Collecting Data from the Programmable Web

Collecting Data from the Programmable Web Introduction to Web Mining for Social Scientists Lecture 7: Collecting Data from the Programmable Web II Prof. Dr. Ulrich Matter (University of St. Gallen) 14/11/2018 1 Collecting Data from the Programmable

More information

JAVASCRIPT - CREATING A TOC

JAVASCRIPT - CREATING A TOC JAVASCRIPT - CREATING A TOC Problem specification - Adding a Table of Contents. The aim is to be able to show a complete novice to HTML, how to add a Table of Contents (TOC) to a page inside a pair of

More information

Introduction to Web Scraping with Python

Introduction to Web Scraping with Python Introduction to Web Scraping with Python NaLette Brodnax The Institute for Quantitative Social Science Harvard University January 26, 2018 workshop structure 1 2 3 4 intro get the review scrape tools Python

More information

DOWNLOAD PDF VBA MACRO TO PRINT MULTIPLE EXCEL SHEETS TO ONE

DOWNLOAD PDF VBA MACRO TO PRINT MULTIPLE EXCEL SHEETS TO ONE Chapter 1 : Print Multiple Sheets Macro to print multiple sheets I have a spreadsheet set up with multiple worksheets. I have one worksheet (Form tab) created that will pull data from the other sheets

More information

Web Scrapping. (Lectures on High-performance Computing for Economists X)

Web Scrapping. (Lectures on High-performance Computing for Economists X) Web Scrapping (Lectures on High-performance Computing for Economists X) Jesús Fernández-Villaverde, 1 Pablo Guerrón, 2 and David Zarruk Valencia 3 December 20, 2018 1 University of Pennsylvania 2 Boston

More information

Beginning HTML. The Nuts and Bolts of building Web pages.

Beginning HTML. The Nuts and Bolts of building Web pages. Beginning HTML The Nuts and Bolts of building Web pages. Overview Today we will cover: 1. what is HTML and what is it not? Building a simple webpage Getting that online. What is HTML? The language of the

More information

ASSIGNMENT #2: SERVER-SIDE DATA PROCESSING

ASSIGNMENT #2: SERVER-SIDE DATA PROCESSING ASSIGNMENT #2: SERVER-SIDE DATA PROCESSING Due October 6, 2010 (in lecture) Reflection Ideation Exercise Bonus Challenge Time to Process Form Input with PHP (12 Points) Time Magazine offers an archive

More information

Lecture 4: Data Collection and Munging

Lecture 4: Data Collection and Munging Lecture 4: Data Collection and Munging Instructor: Outline 1 Data Collection and Scraping 2 Web Scraping basics In-Class Quizzes URL: http://m.socrative.com/ Room Name: 4f2bb99e Data Collection What you

More information

Using Development Tools to Examine Webpages

Using Development Tools to Examine Webpages Chapter 9 Using Development Tools to Examine Webpages Skills you will learn: For this tutorial, we will use the developer tools in Firefox. However, these are quite similar to the developer tools found

More information

Web Programming with PHP

Web Programming with PHP We know that we can use HTML to build websites, but websites built using pure HTML suffer from a serious limitation. Imagine we want to create a website that displays the current time in Cambridge, MA,

More information

Creating an Accessible Word Document. PC Computer. Revised November 27, Adapted from resources created by the Sonoma County Office of Education

Creating an Accessible Word Document. PC Computer. Revised November 27, Adapted from resources created by the Sonoma County Office of Education Creating an Accessible Word Document PC Computer Revised November 27, 2018 Adapted from resources created by the Sonoma County Office of Education Title the Document Add an official title to the document

More information

ASSIGNMENT #3: CLIENT-SIDE INTERACTIVITY WITH JAVASCRIPT AND AJAX

ASSIGNMENT #3: CLIENT-SIDE INTERACTIVITY WITH JAVASCRIPT AND AJAX ASSIGNMENT #3: CLIENT-SIDE INTERACTIVITY WITH JAVASCRIPT AND AJAX Due October 20, 2010 (in lecture) Reflection Ideation Exercise Bonus Challenge Digital Order from Chaos (15 Points) In Everything Is Miscellaneous,

More information

Advanced Training Manual: Surveys Last Updated: October 2013

Advanced Training Manual: Surveys Last Updated: October 2013 Advanced Training Manual: Surveys Last Updated: October 2013 Advanced Training Manual: Surveys Page 1 of 28 Table of Contents Introduction Training Objective Surveys Overview Survey Table Survey Options

More information

Package scraep. July 3, Index 6

Package scraep. July 3, Index 6 Type Package Title Scrape European Parliament Careers Version 1.1 Date 2018-07-01 Package scraep July 3, 2018 Author Maintainer A utility to webscrape the in-house careers of members of the European parliament,

More information

Web Scraping. Web Scraping. Being More Explicit in Step Take a webpage designed for humans to read

Web Scraping. Web Scraping. Being More Explicit in Step Take a webpage designed for humans to read Web Scraping Web Scraping 1. Take a webpage designed for humans to read 2. Have the computer extract the information we actually want 3. Iterate as appropriate === Take in unstructured pages, return rigidly

More information

Exercise 1 Using Boolean variables, incorporating JavaScript code into your HTML webpage and using the document object

Exercise 1 Using Boolean variables, incorporating JavaScript code into your HTML webpage and using the document object CS1046 Lab 5 Timing: This lab should take you approximately 2 hours. Objectives: By the end of this lab you should be able to: Recognize a Boolean variable and identify the two values it can take Calculate

More information

Introduction to Corpora

Introduction to Corpora Introduction to Max Planck Summer School 2017 Overview These slides describe the process of getting a corpus of written language. Input: Output: A set of documents (e.g. text les), D. A matrix, X, containing

More information

Excel: Tables, Pivot Tables & More

Excel: Tables, Pivot Tables & More Excel: Tables, Pivot Tables & More February 7, 2019 Sheldon Dueck, MCT dueck21@gmail.com http://bit.ly/pivottables_fmi (Booklet) 1 Contents Tables... 3 Different ways of creating pivot tables... 4 Compact,

More information

Package scraep. November 15, Index 6

Package scraep. November 15, Index 6 Type Package Title Scrape European Parliament Careers Version 1.0 Date 2017-11-15 Package scraep November 15, 2017 Author Maintainer A utility to webscrape the in-house careers of members of the European

More information

DATA STRUCTURE AND ALGORITHM USING PYTHON

DATA STRUCTURE AND ALGORITHM USING PYTHON DATA STRUCTURE AND ALGORITHM USING PYTHON Common Use Python Module II Peter Lo Pandas Data Structures and Data Analysis tools 2 What is Pandas? Pandas is an open-source Python library providing highperformance,

More information

Practical 2: Using Minitab (not assessed, for practice only!)

Practical 2: Using Minitab (not assessed, for practice only!) Practical 2: Using Minitab (not assessed, for practice only!) Instructions 1. Read through the instructions below for Accessing Minitab. 2. Work through all of the exercises on this handout. If you need

More information

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #16 Loops: Matrix Using Nested for Loop

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #16 Loops: Matrix Using Nested for Loop Introduction to Programming in C Department of Computer Science and Engineering Lecture No. #16 Loops: Matrix Using Nested for Loop In this section, we will use the, for loop to code of the matrix problem.

More information

Importing from VSpace to Canvas

Importing from VSpace to Canvas Importing from VSpace to Canvas Below are the directions for how to import content from VSpace to Canvas. You can only import content from the following VSpace tools: Resources Lessons Assignments Test

More information

Intermediate Programming in R Session 2: Loops. Olivia Lau, PhD

Intermediate Programming in R Session 2: Loops. Olivia Lau, PhD Intermediate Programming in R Session 2: Loops Olivia Lau, PhD Outline When to Use Loops Measuring and Monitoring R s Performance Different Types of Loops Fast Loops 2 When to Use Loops Loops repeat a

More information

EGR 111 Loops. This lab is an introduction to loops, which allow MATLAB to repeat commands a certain number of times.

EGR 111 Loops. This lab is an introduction to loops, which allow MATLAB to repeat commands a certain number of times. EGR 111 Loops This lab is an introduction to loops, which allow MATLAB to repeat commands a certain number of times. New MATLAB commands: for, while,, length 1. The For Loop Suppose we want print a statement

More information

CIT 590 Homework 5 HTML Resumes

CIT 590 Homework 5 HTML Resumes CIT 590 Homework 5 HTML Resumes Purposes of this assignment Reading from and writing to files Scraping information from a text file Basic HTML usage General problem specification A website is made up of

More information

HOW TO BUILD YOUR FIRST ROBOT

HOW TO BUILD YOUR FIRST ROBOT Kofax Kapow TM HOW TO BUILD YOUR FIRST ROBOT INSTRUCTION GUIDE Table of Contents How to Make the Most of This Tutorial Series... 1 Part 1: Installing and Licensing Kofax Kapow... 2 Install the Software...

More information

Electric Paoge. Browser Scripting with imacros in Illuminate

Electric Paoge. Browser Scripting with imacros in Illuminate Electric Paoge Browser Scripting with imacros in Illuminate Browser Scripting with imacros in Illuminate Welcome Find the latest version of this presentation, plus related materials, at https://goo.gl/d72sdv.

More information

Creating A Web Page. Computer Concepts I and II. Sue Norris

Creating A Web Page. Computer Concepts I and II. Sue Norris Creating A Web Page Computer Concepts I and II Sue Norris Agenda What is HTML HTML and XHTML Tags Required HTML and XHTML Tags Using Notepad to Create a Simple Web Page Viewing Your Web Page in a Browser

More information

Lime Survey User Guide

Lime Survey User Guide Lime Survey User Guide Version 2.58 VIU version Originally created by Okanagan College Kevin Trotzuk, IR Analyst Edited by Anastasia Chwist January 2017 Table of Contents Lime Administration Screen...

More information

Creating an Online Catalogue Search for CD Collection with AJAX, XML, and PHP Using a Relational Database Server on WAMP/LAMP Server

Creating an Online Catalogue Search for CD Collection with AJAX, XML, and PHP Using a Relational Database Server on WAMP/LAMP Server CIS408 Project 5 SS Chung Creating an Online Catalogue Search for CD Collection with AJAX, XML, and PHP Using a Relational Database Server on WAMP/LAMP Server The catalogue of CD Collection has millions

More information

Lifehack #1 - Automating Twitter Growth without Being Blocked by Twitter

Lifehack #1 - Automating Twitter Growth without Being Blocked by Twitter Lifehack #1 - Automating Twitter Growth without Being Blocked by Twitter Intro 2 Disclaimer 2 Important Caveats for Twitter Automation 2 Enter Azuqua 3 Getting Ready 3 Setup and Test your Connection! 4

More information

Gradebook Export/Import Instructions

Gradebook Export/Import Instructions Gradebook Export/Import Instructions Introduction Canvas gives the option to export the gradebook to a CSV file. You can open this file in a spreadsheet program and add or change grades, add columns and

More information

The Structure of the Web. Jim and Matthew

The Structure of the Web. Jim and Matthew The Structure of the Web Jim and Matthew Workshop Structure 1. 2. 3. 4. 5. 6. 7. What is a browser? HTML CSS Javascript LUNCH Clients and Servers (creating a live website) Build your Own Website Workshop

More information

Course Syllabus. Course Title. Who should attend? Course Description. PHP ( Level 1 (

Course Syllabus. Course Title. Who should attend? Course Description. PHP ( Level 1 ( Course Title PHP ( Level 1 ( Course Description PHP '' Hypertext Preprocessor" is the most famous server-side programming language in the world. It is used to create a dynamic website and it supports many

More information

Lecture 57 Dynamic Programming. (Refer Slide Time: 00:31)

Lecture 57 Dynamic Programming. (Refer Slide Time: 00:31) Programming, Data Structures and Algorithms Prof. N.S. Narayanaswamy Department of Computer Science and Engineering Indian Institution Technology, Madras Lecture 57 Dynamic Programming (Refer Slide Time:

More information

Web Scraping with Python

Web Scraping with Python Web Scraping with Python Carlos Hurtado Department of Economics University of Illinois at Urbana-Champaign hrtdmrt2@illinois.edu Dec 5th, 2017 C. Hurtado (UIUC - Economics) Numerical Methods On the Agenda

More information

Creating an Accessible Word Document. Mac Computer. Revised November 28, Adapted from resources created by the Sonoma County Office of Education

Creating an Accessible Word Document. Mac Computer. Revised November 28, Adapted from resources created by the Sonoma County Office of Education Creating an Accessible Word Document Mac Computer Revised November 28, 2018 Adapted from resources created by the Sonoma County Office of Education Title the Document Add an official title to the document

More information

Beginning Web Site Design

Beginning Web Site Design Beginning Web Site Design Stanford University Continuing Studies CS 03 (Summer CS 21) Mark Branom branom@alumni.stanford.edu http://web.stanford.edu/people/markb/ Course Web Site: http://web.stanford.edu/group/csp/cs03/

More information

In this project, you ll learn how to create your own webpage to tell a story, joke or poem. Think about the story you want to tell.

In this project, you ll learn how to create your own webpage to tell a story, joke or poem. Think about the story you want to tell. Tell a Story Introduction In this project, you ll learn how to create your own webpage to tell a story, joke or poem. Step 1: Decide on a story Before you get coding, you ll need to decide on a story to

More information

REST in a Nutshell: A Mini Guide for Python Developers

REST in a Nutshell: A Mini Guide for Python Developers REST in a Nutshell: A Mini Guide for Python Developers REST is essentially a set of useful conventions for structuring a web API. By "web API", I mean an API that you interact with over HTTP - making requests

More information

XML Processing & Web Services. Husni Husni.trunojoyo.ac.id

XML Processing & Web Services. Husni Husni.trunojoyo.ac.id XML Processing & Web Services Husni Husni.trunojoyo.ac.id Based on Randy Connolly and Ricardo Hoar Fundamentals of Web Development, Pearson Education, 2015 Objectives 1 XML Overview 2 XML Processing 3

More information

Tips & Tricks Making Accessible MS Word Documents

Tips & Tricks Making Accessible MS Word Documents Use Headings Why? Screen readers do not read underline and bold as headings. A screen reader user will not know that text is a heading unless you designate it as such. When typing a new section heading,

More information

Web scraping tools, a real life application

Web scraping tools, a real life application Web scraping tools, a real life application ESTP course on Automated collection of online proces: sources, tools and methodological aspects Guido van den Heuvel, Dick Windmeijer, Olav ten Bosch, Statistics

More information

Using Dreamweaver. 5 More Page Editing. Bulleted and Numbered Lists

Using Dreamweaver. 5 More Page Editing. Bulleted and Numbered Lists Using Dreamweaver 5 By now, you should have a functional template, with one simple page based on that template. For the remaining pages, we ll create each page based on the template and then save each

More information

INTRODUCTION (1) Recognize HTML code (2) Understand the minimum requirements inside a HTML page (3) Know what the viewer sees and the system uses

INTRODUCTION (1) Recognize HTML code (2) Understand the minimum requirements inside a HTML page (3) Know what the viewer sees and the system uses Assignment Two: The Basic Web Code INTRODUCTION HTML (Hypertext Markup Language) In the previous assignment you learned that FTP was just another language that computers use to communicate. The same holds

More information

E-Business Systems 1 INTE2047 Lab Exercises. Lab 5 Valid HTML, Home Page & Editor Tables

E-Business Systems 1 INTE2047 Lab Exercises. Lab 5 Valid HTML, Home Page & Editor Tables Lab 5 Valid HTML, Home Page & Editor Tables Navigation Topics Covered Server Side Includes (SSI) PHP Scripts menu.php.htaccess assessment.html labtasks.html Software Used: HTML Editor Background Reading:

More information

get set up for today s workshop

get set up for today s workshop get set up for today s workshop Please open the following in Firefox: 1. Poll: bit.ly/iuwim25 Take a brief poll before we get started 2. Python: www.pythonanywhere.com Create a free account Click on Account

More information

2nd Year PhD Student, CMU. Research: mashups and end-user programming (EUP) Creator of Marmite

2nd Year PhD Student, CMU. Research: mashups and end-user programming (EUP) Creator of Marmite Mashups Jeff Wong Human-Computer Interaction Institute Carnegie Mellon University jeffwong@cmu.edu Who am I? 2nd Year PhD Student, HCII @ CMU Research: mashups and end-user programming (EUP) Creator of

More information

c122sep814.notebook September 08, 2014 All assignments should be sent to Backup please send a cc to this address

c122sep814.notebook September 08, 2014 All assignments should be sent to Backup please send a cc to this address All assignments should be sent to p.grocer@rcn.com Backup please send a cc to this address Note that I record classes and capture Smartboard notes. They are posted under audio and Smartboard under XHTML

More information

Creating Codes with Spreadsheet Upload

Creating Codes with Spreadsheet Upload Creating Codes with Spreadsheet Upload In order to create a code, you must first have a group, prefix and account set up and associated to each other. This document will provide instructions on creating

More information

Group Administrator Guide

Group Administrator Guide Get Started... 4 What a Group Administrator Can Do... 7 About Premier... 10 Use Premier... 11 Use the AT&T IP Flexible Reach Customer Portal... 14 Search and Listing Overview... 17 What s New in the Group

More information

Year 8 Computing Science End of Term 3 Revision Guide

Year 8 Computing Science End of Term 3 Revision Guide Year 8 Computing Science End of Term 3 Revision Guide Student Name: 1 Hardware: any physical component of a computer system. Input Device: a device to send instructions to be processed by the computer

More information

Using Dreamweaver CC. 5 More Page Editing. Bulleted and Numbered Lists

Using Dreamweaver CC. 5 More Page Editing. Bulleted and Numbered Lists Using Dreamweaver CC 5 By now, you should have a functional template, with one simple page based on that template. For the remaining pages, we ll create each page based on the template and then save each

More information

Hyper- Any time any where go to any web pages. Text- Simple Text. Markup- What will you do

Hyper- Any time any where go to any web pages. Text- Simple Text. Markup- What will you do HTML Interview Questions and Answers What is HTML? Answer1: HTML, or HyperText Markup Language, is a Universal language which allows an individual using special code to create web pages to be viewed on

More information

BeautifulSoup: Web Scraping with Python

BeautifulSoup: Web Scraping with Python : Web Scraping with Python Andrew Peterson Apr 9, 2013 files available at: https://github.com/aristotle-tek/_pres Roadmap Uses: data types, examples... Getting Started downloading files with wget : in

More information

HTML and CSS a further introduction

HTML and CSS a further introduction HTML and CSS a further introduction By now you should be familiar with HTML and CSS and what they are, HTML dictates the structure of a page, CSS dictates how it looks. This tutorial will teach you a few

More information

5/10/2009. Introduction. The light-saber is a Jedi s weapon not as clumsy or random as a blaster.

5/10/2009. Introduction. The light-saber is a Jedi s weapon not as clumsy or random as a blaster. The Hacking Protocols and The Hackers Sword The light-saber is a Jedi s weapon not as clumsy or random as a blaster. Obi-Wan Kenobi, Star Wars: Episode IV Slide 2 Introduction Why are firewalls basically

More information

Quick.JS Documentation

Quick.JS Documentation Quick.JS Documentation Release v0.6.1-beta Michael Krause Jul 22, 2017 Contents 1 Installing and Setting Up 1 1.1 Installation................................................ 1 1.2 Setup...................................................

More information

Lab 4: Bash Scripting

Lab 4: Bash Scripting Lab 4: Bash Scripting February 20, 2018 Introduction This lab will give you some experience writing bash scripts. You will need to sign in to https://git-classes. mst.edu and git clone the repository for

More information

Creating an with Constant Contact. A step-by-step guide

Creating an  with Constant Contact. A step-by-step guide Creating an Email with Constant Contact A step-by-step guide About this Manual Once your Constant Contact account is established, use this manual as a guide to help you create your email campaign Here

More information

Aligned Elements Importer V user manual. Aligned AG Tellstrasse Zürich Phone: +41 (0)

Aligned Elements Importer V user manual. Aligned AG Tellstrasse Zürich Phone: +41 (0) Aligned Elements Importer V2.4.211.14302 user manual Aligned AG Tellstrasse 13 8004 Zürich Phone: +41 (0)44 312 50 20 www.aligned.ch info@aligned.ch Table of Contents 1.1 Introduction...3 1.2 Installation...3

More information

Title and Modify Page Properties

Title and Modify Page Properties Dreamweaver After cropping out all of the pieces from Photoshop we are ready to begin putting the pieces back together in Dreamweaver. If we were to layout all of the pieces on a table we would have graphics

More information

CREATING WEBSITES. What you need to build a website Part One The Basics. Chas Large. Welcome one and all

CREATING WEBSITES. What you need to build a website Part One The Basics. Chas Large. Welcome one and all Slide 1 CREATING WEBSITES What you need to build a website Part One The Basics Chas Large Welcome one and all Short intro about Chas large TV engineer, computer geek, self taught, became IT manager in

More information

CREATING ACCESSIBLE SPREADSHEETS IN MICROSOFT EXCEL 2010/13 (WINDOWS) & 2011 (MAC)

CREATING ACCESSIBLE SPREADSHEETS IN MICROSOFT EXCEL 2010/13 (WINDOWS) & 2011 (MAC) CREATING ACCESSIBLE SPREADSHEETS IN MICROSOFT EXCEL 2010/13 (WINDOWS) & 2011 (MAC) Screen readers and Excel Users who are blind rely on software called a screen reader to interact with spreadsheets. Screen

More information

CSSE 460 Computer Networks Group Projects: Implement a Simple HTTP Web Proxy

CSSE 460 Computer Networks Group Projects: Implement a Simple HTTP Web Proxy CSSE 460 Computer Networks Group Projects: Implement a Simple HTTP Web Proxy Project Overview In this project, you will implement a simple web proxy that passes requests and data between a web client and

More information

DATA COLLECTION. Slides by WESLEY WILLETT 13 FEB 2014

DATA COLLECTION. Slides by WESLEY WILLETT 13 FEB 2014 DATA COLLECTION Slides by WESLEY WILLETT INFO VISUAL 340 ANALYTICS D 13 FEB 2014 WHERE DOES DATA COME FROM? We tend to think of data as a thing in a database somewhere WHY DO YOU NEED DATA? (HINT: Usually,

More information

Programming Lab 1 (JS Hwk 3) Due Thursday, April 28

Programming Lab 1 (JS Hwk 3) Due Thursday, April 28 Programming Lab 1 (JS Hwk 3) Due Thursday, April 28 Lab You may work with partners for these problems. Make sure you put BOTH names on the problems. Create a folder named JSLab3, and place all of the web

More information

Eng 110, Spring Week 03 Lab02- Dreamwaver Session

Eng 110, Spring Week 03 Lab02- Dreamwaver Session Eng 110, Spring 2008 Week 03 Lab02- Dreamwaver Session Assignment Recreate the 3-page website you did last week by using Dreamweaver. You should use tables to control your layout. You should modify fonts,

More information

Cascading style sheets

Cascading style sheets Cascading style sheets The best way to create websites is to keep the content separate from the presentation. The best way to create websites is to keep the content separate from the presentation. HTML

More information

XML: some structural principles

XML: some structural principles XML: some structural principles Hayo Thielecke University of Birmingham www.cs.bham.ac.uk/~hxt October 18, 2011 1 / 25 XML in SSC1 versus First year info+web Information and the Web is optional in Year

More information

Zend Studio has the reputation of being one of the most mature and powerful

Zend Studio has the reputation of being one of the most mature and powerful Exploring the developer environment RAPID DEVELOPMENT PHP experts consider Zend Studio the most mature and feature-rich IDE for PHP. The latest version offers enhanced database manipulation and other improvements.

More information

Screen Scraping. Screen Scraping Defintions ( Web Scraping (

Screen Scraping. Screen Scraping Defintions (  Web Scraping ( Screen Scraping Screen Scraping Defintions (http://www.wikipedia.org/) Originally, it referred to the practice of reading text data from a computer display terminal's screen. This was generally done by

More information

Alpha College of Engineering and Technology. Question Bank

Alpha College of Engineering and Technology. Question Bank Alpha College of Engineering and Technology Department of Information Technology and Computer Engineering Chapter 1 WEB Technology (2160708) Question Bank 1. Give the full name of the following acronyms.

More information

Skills you will learn: How to make requests to multiple URLs using For loops and by altering the URL

Skills you will learn: How to make requests to multiple URLs using For loops and by altering the URL Chapter 9 Your First Multi-Page Scrape Skills you will learn: How to make requests to multiple URLs using For loops and by altering the URL In this tutorial, we will pick up from the detailed example from

More information

CIS 194: Homework 6. Due Friday, October 17, Preface. Setup. Generics. No template file is provided for this homework.

CIS 194: Homework 6. Due Friday, October 17, Preface. Setup. Generics. No template file is provided for this homework. CIS 194: Homework 6 Due Friday, October 17, 2014 No template file is provided for this homework. Download the markets.json file from the website, and make your HW06.hs Haskell file with your name, any

More information

Reading How the Web Works

Reading How the Web Works Reading 1.3 - How the Web Works By Jonathan Lane Introduction Every so often, you get offered a behind-the-scenes look at the cogs and fan belts behind the action. Today is your lucky day. In this article

More information

Software Development & Education Center PHP 5

Software Development & Education Center PHP 5 Software Development & Education Center PHP 5 (CORE) Detailed Curriculum Core PHP Introduction Classes & Objects Object based & Object Oriented Programming Three Tier Architecture HTML & significance of

More information

Web Security. Jace Baker, Nick Ramos, Hugo Espiritu, Andrew Le

Web Security. Jace Baker, Nick Ramos, Hugo Espiritu, Andrew Le Web Security Jace Baker, Nick Ramos, Hugo Espiritu, Andrew Le Topics Web Architecture Parameter Tampering Local File Inclusion SQL Injection XSS Web Architecture Web Request Structure Web Request Structure

More information

Introduction to Computer Science Web Development

Introduction to Computer Science Web Development Introduction to Computer Science Web Development Flavio Esposito http://cs.slu.edu/~esposito/teaching/1080/ Lecture 14 Lecture outline Discuss HW Intro to Responsive Design Media Queries Responsive Layout

More information

CS109 Data Science Data Munging

CS109 Data Science Data Munging CS109 Data Science Data Munging Hanspeter Pfister & Joe Blitzstein pfister@seas.harvard.edu / blitzstein@stat.harvard.edu http://dilbert.com/strips/comic/2008-05-07/ Enrollment Numbers 377 including all

More information

Introduction April 27 th 2016

Introduction April 27 th 2016 Social Web Mining Summer Term 2016 1 Introduction April 27 th 2016 Dr. Darko Obradovic Insiders Technologies GmbH Kaiserslautern d.obradovic@insiders-technologies.de Outline for Today 1.1 1.2 1.3 1.4 1.5

More information

An Online Interactive Database Platform For Career Searching

An Online Interactive Database Platform For Career Searching 22 Int'l Conf. Information and Knowledge Engineering IKE'18 An Online Interactive Database Platform For Career Searching Brandon St. Amour Zizhong John Wang Department of Mathematics and Computer Science

More information

.. Documentation. Release 0.4 beta. Author

.. Documentation. Release 0.4 beta. Author .. Documentation Release 0.4 beta Author May 06, 2015 Contents 1 Browser 3 1.1 Basic usages............................................... 3 1.2 Form manipulation............................................

More information

COS 116 The Computational Universe Laboratory 1: Web 2.0

COS 116 The Computational Universe Laboratory 1: Web 2.0 COS 116 The Computational Universe Laboratory 1: Web 2.0 Must be completed by the noon Tuesday, February 9, 2010. In this week s lab, you ll explore some web sites that encourage collaboration among their

More information

Assignment 0. Nothing here to hand in

Assignment 0. Nothing here to hand in Assignment 0 Nothing here to hand in The questions here have solutions attached. Follow the solutions to see what to do, if you cannot otherwise guess. Though there is nothing here to hand in, it is very

More information

Adding Content to Blackboard

Adding Content to Blackboard Adding Content to Blackboard Objectives... 2 Task Sheet for: Adding Content to Blackboard... 3 What is Content?...4 Presentation Type and File Formats... 5 The Syllabus Example... 6 PowerPoint Example...

More information

Scraping Sites that Don t Want to be Scraped/ Scraping Sites that Use Search Forms

Scraping Sites that Don t Want to be Scraped/ Scraping Sites that Use Search Forms Chapter 9 Scraping Sites that Don t Want to be Scraped/ Scraping Sites that Use Search Forms Skills you will learn: Basic setup of the Selenium library, which allows you to control a web browser from a

More information

HTML and CSS COURSE SYLLABUS

HTML and CSS COURSE SYLLABUS HTML and CSS COURSE SYLLABUS Overview: HTML and CSS go hand in hand for developing flexible, attractively and user friendly websites. HTML (Hyper Text Markup Language) is used to show content on the page

More information

Copyright 2014 Blue Net Corporation. All rights reserved

Copyright 2014 Blue Net Corporation. All rights reserved a) Abstract: REST is a framework built on the principle of today's World Wide Web. Yes it uses the principles of WWW in way it is a challenge to lay down a new architecture that is already widely deployed

More information

This document provides a concise, introductory lesson in HTML formatting.

This document provides a concise, introductory lesson in HTML formatting. Tip Sheet This document provides a concise, introductory lesson in HTML formatting. Introduction to HTML In their simplest form, web pages contain plain text and formatting tags. The formatting tags are

More information

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4 Prof. James She james.she@ust.hk 1 Selected Works of Activity 4 2 Selected Works of Activity 4 3 Last lecture 4 Mid-term

More information

Introduction to Web Development

Introduction to Web Development Introduction to Web Development Lecture 1 CGS 3066 Fall 2016 September 8, 2016 Why learn Web Development? Why learn Web Development? Reach Today, we have around 12.5 billion web enabled devices. Visual

More information

CS Exam 1 Review Suggestions - Spring 2017

CS Exam 1 Review Suggestions - Spring 2017 CS 328 - Exam 1 Review Suggestions p. 1 CS 328 - Exam 1 Review Suggestions - Spring 2017 last modified: 2017-02-16 You are responsible for material covered in class sessions and homeworks; but, here's

More information

Project 2 Implementing a Simple HTTP Web Proxy

Project 2 Implementing a Simple HTTP Web Proxy Project 2 Implementing a Simple HTTP Web Proxy Overview: CPSC 460 students are allowed to form a group of up to 3 students. CPSC 560 students each must take it as an individual project. This project aims

More information

(try adding using css to add some space between the bottom of the art div and the reset button, this can be done using Margins)

(try adding using css to add some space between the bottom of the art div and the reset button, this can be done using Margins) Pixel Art Editor Extra Challenges 1. Adding a Reset button Add a reset button to your HTML, below the #art div. Pixels go here reset The result should look something

More information

HTML4 TUTORIAL PART 2

HTML4 TUTORIAL PART 2 HTML4 TUTORIAL PART 2 STEP 1 - CREATING A WEB DESIGN FOLDER ON YOUR H DRIVE It will be necessary to create a folder in your H Drive to keep all of your web page items for this tutorial. Follow these steps:

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

CSE 143: Computer Programming II Spring 2015 HW2: HTMLManager (due Thursday, April 16, :30pm)

CSE 143: Computer Programming II Spring 2015 HW2: HTMLManager (due Thursday, April 16, :30pm) CSE 143: Computer Programming II Spring 2015 HW2: HTMLManager (due Thursday, April 16, 2015 11:30pm) This assignment focuses on using Stack and Queue collections. Turn in the following files using the

More information

Data Interfaces in R. Tushar B. Kute,

Data Interfaces in R. Tushar B. Kute, Data Interfaces in R Tushar B. Kute, http://tusharkute.com Data Interfaces In R, we can read data from files stored outside the R environment. We can also write data into files which will be stored and

More information