Introduction to Web Mining for Social Scientists Lecture 4: Web Scraping Workshop Prof. Dr. Ulrich Matter (University of St. Gallen) 10/10/2018
|
|
- Bernice Cummings
- 5 years ago
- Views:
Transcription
1 Introduction to Web Mining for Social Scientists Lecture 4: Web Scraping Workshop Prof. Dr. Ulrich Matter (University of St. Gallen) 10/10/ First Steps in R: Part II In the previous week we looked at the very basics of using R: how to initiate a variable, R as a calculator, data structures, functions, etc. All of this was rather focused on executing command after command or a number of commands at once in an interactive R session. Apart from the definition of a function, we haven t really looked at how to program with R. A large part of basic programming has to do with automating the execution of a number of commands conditional on some control statements. That is, we want to tell the computer to do something until a certain goal is reached. In the simplest case this boils down to a control flow statement that specifies an iteration, a so-called loop. 1.1 Loops A loop is typically a sequence of statements that is executed a specific number of times. How often the code inside the loop is executed depends on a (hopefully) clearly defined control statement. If we know in advance how often the code inside of the loop has to be executed, we typically write a so-called for-loop. If the number of iterations is not clearly known before executing the code, we typically write a so-called while-loop. The following subsections illustrate both of these concepts in R For-loops In simple terms, a for-loop tells the computer to execute a sequence of commands for each case in a set of n cases. For example, a for-loop could be used to sum up each of the elements in a numeric vector of fix length (thus the number of iterations is clearly defined). In plain English, the for-loop would state something like: Start with 0 as the current total value, for each of the elements in the vector add the value of this element to the current total value. Note how this logically implies that the loop will stop once the value of the last element in the vector is added to the total. Let s illustrate this in R. Take the numeric vector c(1,2,3,4,5). A for loop to sum up all elements can be implemented as follows: vector to be summed up numbers <- c(1,2,3,4,5) initiate total total_sum <- 0 number of iterations n <- length(numbers) start loop for (i in 1:n) { total_sum <- total_sum + numbers[i] check result total_sum [1] 15 1
2 compare with result of sum() function sum(numbers) [1] 15 In some situations a simple for-loop might not be sufficient. Within one sequence of commands there might be another sequence of commands that also has to be executed for a number of times each time the first sequence of commands is executed. In such a case we speak of a nested for-loop. We can illustrate this easily by extending the example of the numeric vector above to a matrix for which we want to sum up the values in each column. Building on the loop implemented above, we would say for each column j of a given numeric matrix, execute the for-loop defined above. matrix to be summed up numbers_matrix <- matrix(1:20, ncol = 4) numbers_matrix [,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,] [5,] number of iterations for outer loop m <- ncol(numbers_matrix) number of iterations for inner loop n <- nrow(numbers_matrix) start outer loop (loop over columns of matrix) for (j in 1:m) { start inner loop initiate total total_sum <- 0 for (i in 1:n) { total_sum <- total_sum + numbers_matrix[i, j] print(total_sum) [1] 15 [1] 40 [1] 65 [1] While-loop In a situation where a program has to repeatedly run a sequence of commands but we don t know in advance how many iterations we need in order to reach the intended goal, a while-loop can help. In simple terms, a while loop keeps executing a sequence of commands as long as a certain logical statement is true. The flow chart in Figure 1 illustrates this point. For example, a while-loop in plain English could state something like start with 0 as the total, add 1.12 to the total until the total is larger than 20. We can implement this in R as follows. initiate starting value total <- 0 start loop 2
3 Figure 1: While-loop illustration. Source: While-loop-diagram.svg. while (total <= 20) { total <- total check the result total [1] Loops and Web Scraping The two types of loops are very helpful in many web scraping tasks. Note how the web scraping example of last week ( blueprint ) is only designed to run for one specific Amazon product review (based on the product id). We can easily imagine to extend the scraper to gather more data. For example, we could first collect a bunch of product ids for which we want to collect all reviews. Thus, we could implement this with a for-loop that iterates through each of the product ids and stops once all of the product ids have been used. Alternatively, we could imagine an extension of the basic review scraper that would first scrape all the reviews of one product id and then continue to scrape all reviews of all the products that the reviewer of the initial review also reviewed, and so on until we have collected a certain number of reviews (or collected reviews of a certain number of reviewers, etc.). The following extended examples show the practical use of loops in different web scraping contexts. 2 Web Scraping in Action 2.1 Extracting Voting Tables from the U.S. Senate A simple but very practical web scraping task is to extract data from HTML tables on a website. If we have to do this only once, R might not even be necessary but we might get the data simply by marking the table in a web browser and copy-pasting it into a spreadsheet program such as Excel (and saving it as CSV etc.). However, it is likely the case that we have to repeatedly extract various tables from the same website. The following exercise shows how this can be done in the context of data on roll-call voting in the U.S. Senate. The scraper is made to extract all roll call voting results for a given list of congresses, and combine 3
4 them in one table. The data will be automatically extracted from the official website of the U.S. Senate where all data for the last few congresses are available on pages per session and congress. For example, the URL is pointing to the page providing the data for the first session of the 113th U.S. Congress. First, we inspect the source code with developer tools and figure out how the URLs are constructed. Based on this, we define the header section of a new R script for this scraper. As we want to extract data on voting results from various congresses and sessions, we define the fixed variables CONGRESS and SESSION as vectors. Introduction to Web Data Mining Lecture 4: Roll Call Data Scraper (HTML Tables) This is a basic web scraper to automatically extract data on roll call vote outcomes in the U.S. Senate. The data is extracted directly form the official government website See for an example of the type of page to be scraped. U. Matter, October 2017 PREAMBLE load packages library(httr) library(xml2) library(rvest) initiate fix variables BASE_URL <- " CONGRESS <- c(110:114) SESSION <- c(1, 2) Following the blueprint outlined in the previous week, we write the three components of the scraper. However, in this case we we aim to place all the components in a for-loop in order to iterate through all the pages we want to extract the tables with voting results from. The three components of our web scraper will thus form the body of the for-loop. That is, they build the sequence of commands that are executed sequentially until we have all the data we want to collect. From inspecting the website of the U.S. Senate (see we learn that in order to collect all the roll-call data from the 110th to the 114th congress, we have to iterate not only through each congress but also through each of the two sessions in one congress (each congress consists of two sessions). Thus for each congress and each session per congress, we want to extract the voting data. This implies a nested for-loop: in the outer loop we iterate through individual congresses, in the inner loop (that is, given a specific congress), we iterate through the sessions. Another key aspect to know before getting started is to understand what the result of each iteration is and how we collect/ merge the individual results. As the overall goal of the scraper is to extract data from HTML tables, a reasonable format to store the data of each iteration is a data.frame. Thus, each iteration will result in a data.frame, which implies that we have to store each of these data-frames while running the loop. We can do this with a list. Before starting the loop, we initiate an empty list all_tables <- list(null). Then, within the loop, we add each of the extracted tables (now objects of class data.frame) as an additional element of that list. The following code chunk contains the blueprint for the loop following this strategy (without the actual loop body, i.e., the three scraper components). initiate variables for iteration n_congr <- length(congress) 4
5 n_session <- length(session) all_tables <- list(null) start iteration for (i in 1:n_congr) { for (j in 1:n_session) { ADD COMPONENTS I TO III HERE! add resulting table to list rc_table_list <- list(rc_table) all_tables <- c(all_tables, rc_table_list) Note that in order to add an extracted table (here: a data-frame called rc_table) to the list, we first have to put it in a list rc_table_list <- list(rc_table) and then add it to the list containing all tables: all_tables <- c(all_tables, rc_table_list). The code above does not do anything yet on its own. We have to fill in the three components containing the actual scraping tasks in the body of the loop. When developing each of the components it is helpful to just write them for one iteration (ignoring the loop for a moment). This way we can test each component step by step before iterating over it many times. A simple way to do this is to just manually assign values to the index-variables i and j: i <- 1, j <- 1. The first component (interaction with the server, parse response... ) is then straightforwardly implemented and tested as I) Handle URL, HTTP request and response, parse HTML build the URL page <- paste0("vote_menu_", CONGRESS[i], "_", SESSION[j], rc_url <- paste0(base_url, page) request webpage, parse results rc_resp <- GET(rc_url) rc_html <- read_html(rc_resp) ".htm") As usual, we have to figure out (with the help of developer tools) how to extract the specific part of the HTML-document which contains the data of interest. In this particular case the xpath expression ".//*[@id='secondary_col2']/table" provides the result we are looking for in the second component: II) Extract the data of interest extract the table rc_table_node <- html_node(rc_html, xpath = ".//*[@id='secondary_col2']/table") rc_table <- html_table(rc_table_node) Finally in the last component, we prepare the extracted data for further processing. When looking at the result of the previous component (head(rc_table)), we note that the extracted table does not actually contain information about which congress and session it is from. We add this information by adding two new columns. III) Format and save data for further processing add additional variables rc_table$congress <- CONGRESS[i] rc_table$session <- SESSION[j] With this we have the extracted data from one iteration (one congress-session pair) in the form we want. Once we have tested each of the components and are happy with the overall result for one iteration, we can add them to the body of the loop and put all parts together. 5
6 Introduction to Web Mining Lecture 4: Roll Call Data Scraper (HTML Tables) This is a basic web scraper to automatically extract data on roll call vote outcomes in the U.S. Senate. The data is extracted directly form the official government website See for an example of the type of page to be scraped. U. Matter, October 2017 PREAMBLE load packages library(httr) library(xml2) library(rvest) initiate fix variables BASE_URL <- " CONGRESS <- c(110:114) SESSION <- c(1, 2) SCRAPER initiate variables for iteration n_congr <- length(congress) n_session <- length(session) all_tables <- list(null) start iteration for (i in 1:n_congr) { for (j in 1:n_session) { I) Handle URL, HTTP request and response, parse HTML build the URL page <- paste0("vote_menu_", CONGRESS[i], "_", SESSION[j], rc_url <- paste0(base_url, page) request webpage, parse results rc_resp <- GET(rc_url) rc_html <- read_html(rc_resp) ".htm") II) Extract the data of interest extract the table rc_table_node <- html_node(rc_html, xpath = ".//*[@id='secondary_col2']/table") alternatively: html_node(rc_html, css = "table") rc_table <- html_table(rc_table_node) III) Format and save data for further processing add additional variables rc_table$congress <- CONGRESS[i] rc_table$session <- SESSION[j] 6
7 add resulting table to list rc_table_list <- list(rc_table) all_tables <- c(all_tables, rc_table_list) As a last step, once the loop has finished, we can stack the individual data-frames together to get one large data-frame which we then can store locally as a csv-file to further work with the collected data. combine all tables in one: big_table <- do.call("rbind", all_tables) write result to file write.csv(x = big_table, file = "data/3_senate_rc.csv", row.names = FALSE) The first rows and columns of the resulting csv-file: Vote (Tally) Result 442 (93-0) Confirmed 441 (76-17) Agreed to 440 (48-46) Rejected 439 (70-25) Agreed to 438 (50-45) Rejected 437 (24-71) Rejected 2.2 A Simple Text Scraper for Wikipedia In this exercise we write an R script that looks up a bunch of terms in Wikipedia, parses the search results, extracts the text of the found page, and saves it locally as a text file. As usual, we first inspect the website with developer tools and have a close look at the part of the website containing the search field. We recognize that the HTML form s action attribute is indicating a relative link /w/index.php. This tells us that once a user hits enter to submit what she entered in the form, the search term will be further processed by a PHP script on Wikipedia s server. From this, however, we do not know yet, how the data will be submitted, or in other words, how we do have to formulate either a GET or POST request in order to mimic a user typing requests into the search field. In order to understand how the search function on Wikipedia pages works under the hood, we open the Network panel in the Firefox Developer Tools, and switch the HTML filter on (as we are only interested in the traffic related to HTML documents). We then type Donald Trump in the search field of the Wikipedia page and hit enter. The first entry of the network panel shows us the first transfer recorded after we hit enter. It tells us that the search function works as such that a GET request with an URL pointing to the PHP-script discovered above is sent to the server. We can copy the exact URL of the GET request by left-clicking on it in the network panel and select Copy/Copy URL and then verify that this is actually how the Wikipedia search function works by pasting the copied URL ( back into the Firefox address bar and hit enter. We can then test whether correctly understand how the URL for a query needs to be constructed by replacing the Donald+Trump part with Barack+Obama and see what 7
8 we get. Based on our insights about how the search field on Wikipedia works, we can start implementing our scraper. In the documentation of this script it is helpful to point out that there are two important types of URLs to be considered here: one as an example of a page to scrape data from, and one pointing to the search function. Since different parts of the URL to Wikipedia s search function will become handy, we define the parsed URL from our Donald Trump example as a fix variable. The aim of the scraper is to extract the text of the returned search result (the found Wikipedia entry) and store it locally in a text file. Therefore, we already define an output directory (RESULTS_DIR <- "data/wikipedia") where results should be stored. Introduction to Web Mining Lecture 4: Wikipedia Search Form Scraper This is a basic web scraper to automatically look up search terms in Wikipedia and extract the text of the returned page. See for an example of the type of page to be scraped. See for the type of URL used by Wikipedia's search function U. Matter, October 2017 PREAMBLE load packages library(httr) library(xml2) library(rvest) library(stringi) initiate fix variables SEARCH_URL <- parse_url(" SEARCH_TERM <- "Barak Obama" RESULTS_DIR <- "data/wikipedia/" As we have parsed the rather complex URL to perform searches on Wikipedia from the example above, we can simply modify the resulting object by replacing the respective parameter (search): SEARCH_URL$query$search <- SEARCH_TERM and then use the function build_url() to construct the URL for an individual request. The rest of the first component is straightforward from the blueprint. I) URL, HANDLE HTTP REQUEST AND THE RESPONSE ---- Build the URL (update search term) SEARCH_URL$query$search <- SEARCH_TERM fetch the website via a HTTP GET request URL <- build_url(search_url) search_result <- GET(URL) parse the content of the response (the html code) search_result_html <- read_html(search_result) or, alternatively: body <- content(resp) In the second component, we first identify the part of the parsed HTML document that we want to extract. In the case of how Wikipedia pages are currently built, it turns out that a straightforward way to do this is to select all paragraphs (<p>) that are embedded in a <div>-tag of class mw-parser-output. The xpath expression ".//*[@class='mw-parser-output']/p" captures thus all the HTML elements with content of 8
9 interest. In order to extract the text from those elements we simply apply the html_text()-function. II) filter HTML, extract data ---- content_nodes <- html_nodes(search_result_html, xpath = ".//*[@class='mw-parser-output']/p") content_text <- html_text(content_nodes) Finally, in the last component we define the name of the text-file to which we want to save the extracted text. 1 III) write text to file ---- filepath <- paste0(results_dir, stri_replace_all_fixed(str = SEARCH_TERM, " ", ""), ".txt" ) write(content_text, filepath) Putting all parts together, we can start using this script to automate the extraction of text from Wikipedia for any search term. Given the previous exercise, it should be straightforward to tweak this script in order to extract text from various pages based on a number of search terms (via a loop). Introduction to Web Mining Lecture 4: Wikipedia Search Form Scraper This is a basic web scraper to automatically look up search terms in Wikipedia and extract the text of the returned page. See for an example of the type of page to be scraped. See for the type of URL used by Wikipedia's search function U. Matter, October 2017 PREAMBLE load packages library(httr) library(xml2) library(rvest) library(stringi) initiate fix variables SEARCH_URL <- parse_url(" SEARCH_TERM <- "Barak Obama" RESULTS_DIR <- "data/wikipedia/" I) URL, HANDLE HTTP REQUEST AND THE RESPONSE ---- Build the URL (update search term) SEARCH_URL$query$search <- SEARCH_TERM fetch the website via a HTTP GET request URL <- build_url(search_url) search_result <- GET(URL) parse the content of the response (the html code) search_result_html <- read_html(search_result) or, alternatively: body <- content(resp) 1 The function stri_replace_all_fixed() is used here to automatically remove all the white space from the search term. Thus, in the case of a search with the term Donald Trump, the extracted data would be stored in a text-file with the path data/wikipedia/donaldtrump.txt. 9
10 II) filter HTML, extract data ---- content_nodes <- html_nodes(search_result_html, xpath = ".//*[@class='mw-parser-output']/p") content_text <- html_text(content_nodes) III) write text to file ---- filepath <- paste0(results_dir, stri_replace_all_fixed(str = SEARCH_TERM, " ", ""), ".txt" ) write(content_text, filepath) 3 References 10
Collecting Data from the Programmable Web
Introduction to Web Mining for Social Scientists Lecture 7: Collecting Data from the Programmable Web II Prof. Dr. Ulrich Matter (University of St. Gallen) 14/11/2018 1 Collecting Data from the Programmable
More informationJAVASCRIPT - CREATING A TOC
JAVASCRIPT - CREATING A TOC Problem specification - Adding a Table of Contents. The aim is to be able to show a complete novice to HTML, how to add a Table of Contents (TOC) to a page inside a pair of
More informationIntroduction to Web Scraping with Python
Introduction to Web Scraping with Python NaLette Brodnax The Institute for Quantitative Social Science Harvard University January 26, 2018 workshop structure 1 2 3 4 intro get the review scrape tools Python
More informationDOWNLOAD PDF VBA MACRO TO PRINT MULTIPLE EXCEL SHEETS TO ONE
Chapter 1 : Print Multiple Sheets Macro to print multiple sheets I have a spreadsheet set up with multiple worksheets. I have one worksheet (Form tab) created that will pull data from the other sheets
More informationWeb Scrapping. (Lectures on High-performance Computing for Economists X)
Web Scrapping (Lectures on High-performance Computing for Economists X) Jesús Fernández-Villaverde, 1 Pablo Guerrón, 2 and David Zarruk Valencia 3 December 20, 2018 1 University of Pennsylvania 2 Boston
More informationBeginning HTML. The Nuts and Bolts of building Web pages.
Beginning HTML The Nuts and Bolts of building Web pages. Overview Today we will cover: 1. what is HTML and what is it not? Building a simple webpage Getting that online. What is HTML? The language of the
More informationASSIGNMENT #2: SERVER-SIDE DATA PROCESSING
ASSIGNMENT #2: SERVER-SIDE DATA PROCESSING Due October 6, 2010 (in lecture) Reflection Ideation Exercise Bonus Challenge Time to Process Form Input with PHP (12 Points) Time Magazine offers an archive
More informationLecture 4: Data Collection and Munging
Lecture 4: Data Collection and Munging Instructor: Outline 1 Data Collection and Scraping 2 Web Scraping basics In-Class Quizzes URL: http://m.socrative.com/ Room Name: 4f2bb99e Data Collection What you
More informationUsing Development Tools to Examine Webpages
Chapter 9 Using Development Tools to Examine Webpages Skills you will learn: For this tutorial, we will use the developer tools in Firefox. However, these are quite similar to the developer tools found
More informationWeb Programming with PHP
We know that we can use HTML to build websites, but websites built using pure HTML suffer from a serious limitation. Imagine we want to create a website that displays the current time in Cambridge, MA,
More informationCreating an Accessible Word Document. PC Computer. Revised November 27, Adapted from resources created by the Sonoma County Office of Education
Creating an Accessible Word Document PC Computer Revised November 27, 2018 Adapted from resources created by the Sonoma County Office of Education Title the Document Add an official title to the document
More informationASSIGNMENT #3: CLIENT-SIDE INTERACTIVITY WITH JAVASCRIPT AND AJAX
ASSIGNMENT #3: CLIENT-SIDE INTERACTIVITY WITH JAVASCRIPT AND AJAX Due October 20, 2010 (in lecture) Reflection Ideation Exercise Bonus Challenge Digital Order from Chaos (15 Points) In Everything Is Miscellaneous,
More informationAdvanced Training Manual: Surveys Last Updated: October 2013
Advanced Training Manual: Surveys Last Updated: October 2013 Advanced Training Manual: Surveys Page 1 of 28 Table of Contents Introduction Training Objective Surveys Overview Survey Table Survey Options
More informationPackage scraep. July 3, Index 6
Type Package Title Scrape European Parliament Careers Version 1.1 Date 2018-07-01 Package scraep July 3, 2018 Author Maintainer A utility to webscrape the in-house careers of members of the European parliament,
More informationWeb Scraping. Web Scraping. Being More Explicit in Step Take a webpage designed for humans to read
Web Scraping Web Scraping 1. Take a webpage designed for humans to read 2. Have the computer extract the information we actually want 3. Iterate as appropriate === Take in unstructured pages, return rigidly
More informationExercise 1 Using Boolean variables, incorporating JavaScript code into your HTML webpage and using the document object
CS1046 Lab 5 Timing: This lab should take you approximately 2 hours. Objectives: By the end of this lab you should be able to: Recognize a Boolean variable and identify the two values it can take Calculate
More informationIntroduction to Corpora
Introduction to Max Planck Summer School 2017 Overview These slides describe the process of getting a corpus of written language. Input: Output: A set of documents (e.g. text les), D. A matrix, X, containing
More informationExcel: Tables, Pivot Tables & More
Excel: Tables, Pivot Tables & More February 7, 2019 Sheldon Dueck, MCT dueck21@gmail.com http://bit.ly/pivottables_fmi (Booklet) 1 Contents Tables... 3 Different ways of creating pivot tables... 4 Compact,
More informationPackage scraep. November 15, Index 6
Type Package Title Scrape European Parliament Careers Version 1.0 Date 2017-11-15 Package scraep November 15, 2017 Author Maintainer A utility to webscrape the in-house careers of members of the European
More informationDATA STRUCTURE AND ALGORITHM USING PYTHON
DATA STRUCTURE AND ALGORITHM USING PYTHON Common Use Python Module II Peter Lo Pandas Data Structures and Data Analysis tools 2 What is Pandas? Pandas is an open-source Python library providing highperformance,
More informationPractical 2: Using Minitab (not assessed, for practice only!)
Practical 2: Using Minitab (not assessed, for practice only!) Instructions 1. Read through the instructions below for Accessing Minitab. 2. Work through all of the exercises on this handout. If you need
More informationIntroduction to Programming in C Department of Computer Science and Engineering. Lecture No. #16 Loops: Matrix Using Nested for Loop
Introduction to Programming in C Department of Computer Science and Engineering Lecture No. #16 Loops: Matrix Using Nested for Loop In this section, we will use the, for loop to code of the matrix problem.
More informationImporting from VSpace to Canvas
Importing from VSpace to Canvas Below are the directions for how to import content from VSpace to Canvas. You can only import content from the following VSpace tools: Resources Lessons Assignments Test
More informationIntermediate Programming in R Session 2: Loops. Olivia Lau, PhD
Intermediate Programming in R Session 2: Loops Olivia Lau, PhD Outline When to Use Loops Measuring and Monitoring R s Performance Different Types of Loops Fast Loops 2 When to Use Loops Loops repeat a
More informationEGR 111 Loops. This lab is an introduction to loops, which allow MATLAB to repeat commands a certain number of times.
EGR 111 Loops This lab is an introduction to loops, which allow MATLAB to repeat commands a certain number of times. New MATLAB commands: for, while,, length 1. The For Loop Suppose we want print a statement
More informationCIT 590 Homework 5 HTML Resumes
CIT 590 Homework 5 HTML Resumes Purposes of this assignment Reading from and writing to files Scraping information from a text file Basic HTML usage General problem specification A website is made up of
More informationHOW TO BUILD YOUR FIRST ROBOT
Kofax Kapow TM HOW TO BUILD YOUR FIRST ROBOT INSTRUCTION GUIDE Table of Contents How to Make the Most of This Tutorial Series... 1 Part 1: Installing and Licensing Kofax Kapow... 2 Install the Software...
More informationElectric Paoge. Browser Scripting with imacros in Illuminate
Electric Paoge Browser Scripting with imacros in Illuminate Browser Scripting with imacros in Illuminate Welcome Find the latest version of this presentation, plus related materials, at https://goo.gl/d72sdv.
More informationCreating A Web Page. Computer Concepts I and II. Sue Norris
Creating A Web Page Computer Concepts I and II Sue Norris Agenda What is HTML HTML and XHTML Tags Required HTML and XHTML Tags Using Notepad to Create a Simple Web Page Viewing Your Web Page in a Browser
More informationLime Survey User Guide
Lime Survey User Guide Version 2.58 VIU version Originally created by Okanagan College Kevin Trotzuk, IR Analyst Edited by Anastasia Chwist January 2017 Table of Contents Lime Administration Screen...
More informationCreating an Online Catalogue Search for CD Collection with AJAX, XML, and PHP Using a Relational Database Server on WAMP/LAMP Server
CIS408 Project 5 SS Chung Creating an Online Catalogue Search for CD Collection with AJAX, XML, and PHP Using a Relational Database Server on WAMP/LAMP Server The catalogue of CD Collection has millions
More informationLifehack #1 - Automating Twitter Growth without Being Blocked by Twitter
Lifehack #1 - Automating Twitter Growth without Being Blocked by Twitter Intro 2 Disclaimer 2 Important Caveats for Twitter Automation 2 Enter Azuqua 3 Getting Ready 3 Setup and Test your Connection! 4
More informationGradebook Export/Import Instructions
Gradebook Export/Import Instructions Introduction Canvas gives the option to export the gradebook to a CSV file. You can open this file in a spreadsheet program and add or change grades, add columns and
More informationThe Structure of the Web. Jim and Matthew
The Structure of the Web Jim and Matthew Workshop Structure 1. 2. 3. 4. 5. 6. 7. What is a browser? HTML CSS Javascript LUNCH Clients and Servers (creating a live website) Build your Own Website Workshop
More informationCourse Syllabus. Course Title. Who should attend? Course Description. PHP ( Level 1 (
Course Title PHP ( Level 1 ( Course Description PHP '' Hypertext Preprocessor" is the most famous server-side programming language in the world. It is used to create a dynamic website and it supports many
More informationLecture 57 Dynamic Programming. (Refer Slide Time: 00:31)
Programming, Data Structures and Algorithms Prof. N.S. Narayanaswamy Department of Computer Science and Engineering Indian Institution Technology, Madras Lecture 57 Dynamic Programming (Refer Slide Time:
More informationWeb Scraping with Python
Web Scraping with Python Carlos Hurtado Department of Economics University of Illinois at Urbana-Champaign hrtdmrt2@illinois.edu Dec 5th, 2017 C. Hurtado (UIUC - Economics) Numerical Methods On the Agenda
More informationCreating an Accessible Word Document. Mac Computer. Revised November 28, Adapted from resources created by the Sonoma County Office of Education
Creating an Accessible Word Document Mac Computer Revised November 28, 2018 Adapted from resources created by the Sonoma County Office of Education Title the Document Add an official title to the document
More informationBeginning Web Site Design
Beginning Web Site Design Stanford University Continuing Studies CS 03 (Summer CS 21) Mark Branom branom@alumni.stanford.edu http://web.stanford.edu/people/markb/ Course Web Site: http://web.stanford.edu/group/csp/cs03/
More informationIn this project, you ll learn how to create your own webpage to tell a story, joke or poem. Think about the story you want to tell.
Tell a Story Introduction In this project, you ll learn how to create your own webpage to tell a story, joke or poem. Step 1: Decide on a story Before you get coding, you ll need to decide on a story to
More informationREST in a Nutshell: A Mini Guide for Python Developers
REST in a Nutshell: A Mini Guide for Python Developers REST is essentially a set of useful conventions for structuring a web API. By "web API", I mean an API that you interact with over HTTP - making requests
More informationXML Processing & Web Services. Husni Husni.trunojoyo.ac.id
XML Processing & Web Services Husni Husni.trunojoyo.ac.id Based on Randy Connolly and Ricardo Hoar Fundamentals of Web Development, Pearson Education, 2015 Objectives 1 XML Overview 2 XML Processing 3
More informationTips & Tricks Making Accessible MS Word Documents
Use Headings Why? Screen readers do not read underline and bold as headings. A screen reader user will not know that text is a heading unless you designate it as such. When typing a new section heading,
More informationWeb scraping tools, a real life application
Web scraping tools, a real life application ESTP course on Automated collection of online proces: sources, tools and methodological aspects Guido van den Heuvel, Dick Windmeijer, Olav ten Bosch, Statistics
More informationUsing Dreamweaver. 5 More Page Editing. Bulleted and Numbered Lists
Using Dreamweaver 5 By now, you should have a functional template, with one simple page based on that template. For the remaining pages, we ll create each page based on the template and then save each
More informationINTRODUCTION (1) Recognize HTML code (2) Understand the minimum requirements inside a HTML page (3) Know what the viewer sees and the system uses
Assignment Two: The Basic Web Code INTRODUCTION HTML (Hypertext Markup Language) In the previous assignment you learned that FTP was just another language that computers use to communicate. The same holds
More informationE-Business Systems 1 INTE2047 Lab Exercises. Lab 5 Valid HTML, Home Page & Editor Tables
Lab 5 Valid HTML, Home Page & Editor Tables Navigation Topics Covered Server Side Includes (SSI) PHP Scripts menu.php.htaccess assessment.html labtasks.html Software Used: HTML Editor Background Reading:
More informationget set up for today s workshop
get set up for today s workshop Please open the following in Firefox: 1. Poll: bit.ly/iuwim25 Take a brief poll before we get started 2. Python: www.pythonanywhere.com Create a free account Click on Account
More information2nd Year PhD Student, CMU. Research: mashups and end-user programming (EUP) Creator of Marmite
Mashups Jeff Wong Human-Computer Interaction Institute Carnegie Mellon University jeffwong@cmu.edu Who am I? 2nd Year PhD Student, HCII @ CMU Research: mashups and end-user programming (EUP) Creator of
More informationc122sep814.notebook September 08, 2014 All assignments should be sent to Backup please send a cc to this address
All assignments should be sent to p.grocer@rcn.com Backup please send a cc to this address Note that I record classes and capture Smartboard notes. They are posted under audio and Smartboard under XHTML
More informationCreating Codes with Spreadsheet Upload
Creating Codes with Spreadsheet Upload In order to create a code, you must first have a group, prefix and account set up and associated to each other. This document will provide instructions on creating
More informationGroup Administrator Guide
Get Started... 4 What a Group Administrator Can Do... 7 About Premier... 10 Use Premier... 11 Use the AT&T IP Flexible Reach Customer Portal... 14 Search and Listing Overview... 17 What s New in the Group
More informationYear 8 Computing Science End of Term 3 Revision Guide
Year 8 Computing Science End of Term 3 Revision Guide Student Name: 1 Hardware: any physical component of a computer system. Input Device: a device to send instructions to be processed by the computer
More informationUsing Dreamweaver CC. 5 More Page Editing. Bulleted and Numbered Lists
Using Dreamweaver CC 5 By now, you should have a functional template, with one simple page based on that template. For the remaining pages, we ll create each page based on the template and then save each
More informationHyper- Any time any where go to any web pages. Text- Simple Text. Markup- What will you do
HTML Interview Questions and Answers What is HTML? Answer1: HTML, or HyperText Markup Language, is a Universal language which allows an individual using special code to create web pages to be viewed on
More informationBeautifulSoup: Web Scraping with Python
: Web Scraping with Python Andrew Peterson Apr 9, 2013 files available at: https://github.com/aristotle-tek/_pres Roadmap Uses: data types, examples... Getting Started downloading files with wget : in
More informationHTML and CSS a further introduction
HTML and CSS a further introduction By now you should be familiar with HTML and CSS and what they are, HTML dictates the structure of a page, CSS dictates how it looks. This tutorial will teach you a few
More information5/10/2009. Introduction. The light-saber is a Jedi s weapon not as clumsy or random as a blaster.
The Hacking Protocols and The Hackers Sword The light-saber is a Jedi s weapon not as clumsy or random as a blaster. Obi-Wan Kenobi, Star Wars: Episode IV Slide 2 Introduction Why are firewalls basically
More informationQuick.JS Documentation
Quick.JS Documentation Release v0.6.1-beta Michael Krause Jul 22, 2017 Contents 1 Installing and Setting Up 1 1.1 Installation................................................ 1 1.2 Setup...................................................
More informationLab 4: Bash Scripting
Lab 4: Bash Scripting February 20, 2018 Introduction This lab will give you some experience writing bash scripts. You will need to sign in to https://git-classes. mst.edu and git clone the repository for
More informationCreating an with Constant Contact. A step-by-step guide
Creating an Email with Constant Contact A step-by-step guide About this Manual Once your Constant Contact account is established, use this manual as a guide to help you create your email campaign Here
More informationAligned Elements Importer V user manual. Aligned AG Tellstrasse Zürich Phone: +41 (0)
Aligned Elements Importer V2.4.211.14302 user manual Aligned AG Tellstrasse 13 8004 Zürich Phone: +41 (0)44 312 50 20 www.aligned.ch info@aligned.ch Table of Contents 1.1 Introduction...3 1.2 Installation...3
More informationTitle and Modify Page Properties
Dreamweaver After cropping out all of the pieces from Photoshop we are ready to begin putting the pieces back together in Dreamweaver. If we were to layout all of the pieces on a table we would have graphics
More informationCREATING WEBSITES. What you need to build a website Part One The Basics. Chas Large. Welcome one and all
Slide 1 CREATING WEBSITES What you need to build a website Part One The Basics Chas Large Welcome one and all Short intro about Chas large TV engineer, computer geek, self taught, became IT manager in
More informationCREATING ACCESSIBLE SPREADSHEETS IN MICROSOFT EXCEL 2010/13 (WINDOWS) & 2011 (MAC)
CREATING ACCESSIBLE SPREADSHEETS IN MICROSOFT EXCEL 2010/13 (WINDOWS) & 2011 (MAC) Screen readers and Excel Users who are blind rely on software called a screen reader to interact with spreadsheets. Screen
More informationCSSE 460 Computer Networks Group Projects: Implement a Simple HTTP Web Proxy
CSSE 460 Computer Networks Group Projects: Implement a Simple HTTP Web Proxy Project Overview In this project, you will implement a simple web proxy that passes requests and data between a web client and
More informationDATA COLLECTION. Slides by WESLEY WILLETT 13 FEB 2014
DATA COLLECTION Slides by WESLEY WILLETT INFO VISUAL 340 ANALYTICS D 13 FEB 2014 WHERE DOES DATA COME FROM? We tend to think of data as a thing in a database somewhere WHY DO YOU NEED DATA? (HINT: Usually,
More informationProgramming Lab 1 (JS Hwk 3) Due Thursday, April 28
Programming Lab 1 (JS Hwk 3) Due Thursday, April 28 Lab You may work with partners for these problems. Make sure you put BOTH names on the problems. Create a folder named JSLab3, and place all of the web
More informationEng 110, Spring Week 03 Lab02- Dreamwaver Session
Eng 110, Spring 2008 Week 03 Lab02- Dreamwaver Session Assignment Recreate the 3-page website you did last week by using Dreamweaver. You should use tables to control your layout. You should modify fonts,
More informationCascading style sheets
Cascading style sheets The best way to create websites is to keep the content separate from the presentation. The best way to create websites is to keep the content separate from the presentation. HTML
More informationXML: some structural principles
XML: some structural principles Hayo Thielecke University of Birmingham www.cs.bham.ac.uk/~hxt October 18, 2011 1 / 25 XML in SSC1 versus First year info+web Information and the Web is optional in Year
More informationZend Studio has the reputation of being one of the most mature and powerful
Exploring the developer environment RAPID DEVELOPMENT PHP experts consider Zend Studio the most mature and feature-rich IDE for PHP. The latest version offers enhanced database manipulation and other improvements.
More informationScreen Scraping. Screen Scraping Defintions ( Web Scraping (
Screen Scraping Screen Scraping Defintions (http://www.wikipedia.org/) Originally, it referred to the practice of reading text data from a computer display terminal's screen. This was generally done by
More informationAlpha College of Engineering and Technology. Question Bank
Alpha College of Engineering and Technology Department of Information Technology and Computer Engineering Chapter 1 WEB Technology (2160708) Question Bank 1. Give the full name of the following acronyms.
More informationSkills you will learn: How to make requests to multiple URLs using For loops and by altering the URL
Chapter 9 Your First Multi-Page Scrape Skills you will learn: How to make requests to multiple URLs using For loops and by altering the URL In this tutorial, we will pick up from the detailed example from
More informationCIS 194: Homework 6. Due Friday, October 17, Preface. Setup. Generics. No template file is provided for this homework.
CIS 194: Homework 6 Due Friday, October 17, 2014 No template file is provided for this homework. Download the markets.json file from the website, and make your HW06.hs Haskell file with your name, any
More informationReading How the Web Works
Reading 1.3 - How the Web Works By Jonathan Lane Introduction Every so often, you get offered a behind-the-scenes look at the cogs and fan belts behind the action. Today is your lucky day. In this article
More informationSoftware Development & Education Center PHP 5
Software Development & Education Center PHP 5 (CORE) Detailed Curriculum Core PHP Introduction Classes & Objects Object based & Object Oriented Programming Three Tier Architecture HTML & significance of
More informationWeb Security. Jace Baker, Nick Ramos, Hugo Espiritu, Andrew Le
Web Security Jace Baker, Nick Ramos, Hugo Espiritu, Andrew Le Topics Web Architecture Parameter Tampering Local File Inclusion SQL Injection XSS Web Architecture Web Request Structure Web Request Structure
More informationIntroduction to Computer Science Web Development
Introduction to Computer Science Web Development Flavio Esposito http://cs.slu.edu/~esposito/teaching/1080/ Lecture 14 Lecture outline Discuss HW Intro to Responsive Design Media Queries Responsive Layout
More informationCS109 Data Science Data Munging
CS109 Data Science Data Munging Hanspeter Pfister & Joe Blitzstein pfister@seas.harvard.edu / blitzstein@stat.harvard.edu http://dilbert.com/strips/comic/2008-05-07/ Enrollment Numbers 377 including all
More informationIntroduction April 27 th 2016
Social Web Mining Summer Term 2016 1 Introduction April 27 th 2016 Dr. Darko Obradovic Insiders Technologies GmbH Kaiserslautern d.obradovic@insiders-technologies.de Outline for Today 1.1 1.2 1.3 1.4 1.5
More informationAn Online Interactive Database Platform For Career Searching
22 Int'l Conf. Information and Knowledge Engineering IKE'18 An Online Interactive Database Platform For Career Searching Brandon St. Amour Zizhong John Wang Department of Mathematics and Computer Science
More information.. Documentation. Release 0.4 beta. Author
.. Documentation Release 0.4 beta Author May 06, 2015 Contents 1 Browser 3 1.1 Basic usages............................................... 3 1.2 Form manipulation............................................
More informationCOS 116 The Computational Universe Laboratory 1: Web 2.0
COS 116 The Computational Universe Laboratory 1: Web 2.0 Must be completed by the noon Tuesday, February 9, 2010. In this week s lab, you ll explore some web sites that encourage collaboration among their
More informationAssignment 0. Nothing here to hand in
Assignment 0 Nothing here to hand in The questions here have solutions attached. Follow the solutions to see what to do, if you cannot otherwise guess. Though there is nothing here to hand in, it is very
More informationAdding Content to Blackboard
Adding Content to Blackboard Objectives... 2 Task Sheet for: Adding Content to Blackboard... 3 What is Content?...4 Presentation Type and File Formats... 5 The Syllabus Example... 6 PowerPoint Example...
More informationScraping Sites that Don t Want to be Scraped/ Scraping Sites that Use Search Forms
Chapter 9 Scraping Sites that Don t Want to be Scraped/ Scraping Sites that Use Search Forms Skills you will learn: Basic setup of the Selenium library, which allows you to control a web browser from a
More informationHTML and CSS COURSE SYLLABUS
HTML and CSS COURSE SYLLABUS Overview: HTML and CSS go hand in hand for developing flexible, attractively and user friendly websites. HTML (Hyper Text Markup Language) is used to show content on the page
More informationCopyright 2014 Blue Net Corporation. All rights reserved
a) Abstract: REST is a framework built on the principle of today's World Wide Web. Yes it uses the principles of WWW in way it is a challenge to lay down a new architecture that is already widely deployed
More informationThis document provides a concise, introductory lesson in HTML formatting.
Tip Sheet This document provides a concise, introductory lesson in HTML formatting. Introduction to HTML In their simplest form, web pages contain plain text and formatting tags. The formatting tags are
More informationELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She
ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4 Prof. James She james.she@ust.hk 1 Selected Works of Activity 4 2 Selected Works of Activity 4 3 Last lecture 4 Mid-term
More informationIntroduction to Web Development
Introduction to Web Development Lecture 1 CGS 3066 Fall 2016 September 8, 2016 Why learn Web Development? Why learn Web Development? Reach Today, we have around 12.5 billion web enabled devices. Visual
More informationCS Exam 1 Review Suggestions - Spring 2017
CS 328 - Exam 1 Review Suggestions p. 1 CS 328 - Exam 1 Review Suggestions - Spring 2017 last modified: 2017-02-16 You are responsible for material covered in class sessions and homeworks; but, here's
More informationProject 2 Implementing a Simple HTTP Web Proxy
Project 2 Implementing a Simple HTTP Web Proxy Overview: CPSC 460 students are allowed to form a group of up to 3 students. CPSC 560 students each must take it as an individual project. This project aims
More information(try adding using css to add some space between the bottom of the art div and the reset button, this can be done using Margins)
Pixel Art Editor Extra Challenges 1. Adding a Reset button Add a reset button to your HTML, below the #art div. Pixels go here reset The result should look something
More informationHTML4 TUTORIAL PART 2
HTML4 TUTORIAL PART 2 STEP 1 - CREATING A WEB DESIGN FOLDER ON YOUR H DRIVE It will be necessary to create a folder in your H Drive to keep all of your web page items for this tutorial. Follow these steps:
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationCSE 143: Computer Programming II Spring 2015 HW2: HTMLManager (due Thursday, April 16, :30pm)
CSE 143: Computer Programming II Spring 2015 HW2: HTMLManager (due Thursday, April 16, 2015 11:30pm) This assignment focuses on using Stack and Queue collections. Turn in the following files using the
More informationData Interfaces in R. Tushar B. Kute,
Data Interfaces in R Tushar B. Kute, http://tusharkute.com Data Interfaces In R, we can read data from files stored outside the R environment. We can also write data into files which will be stored and
More information