CIT 590 Homework 5 HTML Resumes

Similar documents
CS/IT 114 Introduction to Java, Part 1 FALL 2016 CLASS 3: SEP. 13TH INSTRUCTOR: JIAYIN WANG

Tips from the experts: How to waste a lot of time on this assignment

The name of our class will be Yo. Type that in where it says Class Name. Don t hit the OK button yet.

Tips from the experts: How to waste a lot of time on this assignment

This document provides a concise, introductory lesson in HTML formatting.

Note: This is a miniassignment and the grading is automated. If you do not submit it correctly, you will receive at most half credit.

c122jan2714.notebook January 27, 2014

CSE 5A Introduction to Programming I (C) Homework 4

CS 177 Recitation. Week 1 Intro to Java

Note: This is a miniassignment and the grading is automated. If you do not submit it correctly, you will receive at most half credit.

CIT 590 Homework 6 Fractions

Note: This is a miniassignment and the grading is automated. If you do not submit it correctly, you will receive at most half credit.

Com S 227 Spring 2018 Assignment points Due Date: Thursday, September 27, 11:59 pm (midnight) "Late" deadline: Friday, September 28, 11:59 pm

By Ryan Stevenson. Guidebook #2 HTML

Tips from the experts: How to waste a lot of time on this assignment

CME 193: Introduction to Scientific Python Lecture 1: Introduction

Documentation Requirements Computer Science 2334 Spring 2016

HTML/CSS Lesson Plans

Black Problem 2: Huffman Compression [75 points] Next, the Millisoft back story! Starter files

Module 10A Lecture - 20 What is a function? Why use functions Example: power (base, n)

Assignment #1: /Survey and Karel the Robot Karel problems due: 1:30pm on Friday, October 7th

Gearing Up for Development CS130(0)

Lecture 5. Defining Functions

CSCI 1060U Programming Workshop

CS 1803 Pair Homework 4 Greedy Scheduler (Part I) Due: Wednesday, September 29th, before 6 PM Out of 100 points

Using Dreamweaver CC. 5 More Page Editing. Bulleted and Numbered Lists

Student Success Guide

Assignment #1: and Karel the Robot Karel problems due: 3:15pm on Friday, October 4th due: 11:59pm on Sunday, October 6th

Things to note: Each week Xampp will need to be installed. Xampp is Windows software, similar software is available for Mac, called Mamp.

CSS BASICS. selector { property: value; }

Due: 9 February 2017 at 1159pm (2359, Pacific Standard Time)

Quick.JS Documentation

CONTENTS: What Is Programming? How a Computer Works Programming Languages Java Basics. COMP-202 Unit 1: Introduction

Mehran Sahami Handout #7 CS 106A September 24, 2014

Lab 1: Silver Dollar Game 1 CSCI 2101B Fall 2018

How the Internet Works

Our second exam is Thursday, November 10. Note that it will not be possible to get all the homework submissions graded before the exam.

1 Lecture 5: Advanced Data Structures

HTML. Based mostly on

CMSC 201 Fall 2016 Homework 6 Functions

CSE : Python Programming

Ruby on Rails Welcome. Using the exercise files

Programming with Python

CS Homework 11 p. 1. CS Homework 11

Introduction to Multimedia. MMP100 Spring 2016 thiserichagan.com/mmp100

Homework # 7 DUE: 11:59pm November 15, 2002 NO EXTENSIONS WILL BE GIVEN

CS/IT 114 Introduction to Java, Part 1 FALL 2016 CLASS 2: SEP. 8TH INSTRUCTOR: JIAYIN WANG

CS159 - Assignment 2b

In this project, you ll create a mystery letter that looks like each word has been cut from a different newspaper, magazine, comic or other source.

CSCI 305 Concepts of Programming Languages. Programming Lab 1

Django urls Django Girls Tutorial

(Updated 29 Oct 2016)

Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis

The Structure of the Web. Jim and Matthew

c122sep814.notebook September 08, 2014 All assignments should be sent to Backup please send a cc to this address

Project 1 Balanced binary

// class variable that gives the path where my text files are public static final String path = "C:\\java\\sampledir\\PS10"

1. Please, please, please look at the style sheets job aid that I sent to you some time ago in conjunction with this document.

Lab: Create JSP Home Page Using NetBeans

EE 422C HW 6 Multithreaded Programming

CS 1110, LAB 3: MODULES AND TESTING First Name: Last Name: NetID:

ACORN.COM CS 1110 SPRING 2012: ASSIGNMENT A1

Let s Make a Front Panel using FrontCAD

ORB Education Quality Teaching Resources

Week - 01 Lecture - 04 Downloading and installing Python

RIS shading Series #2 Meet The Plugins

HTML 5 Form Processing

CMPE 655 Fall 2016 Assignment 2: Parallel Implementation of a Ray Tracer

LeakDAS Version 4 The Complete Guide

Programming Assignment 2 ( 100 Points )

CMSC162 Intro to Algorithmic Design II Blaheta. Lab March 2019

Title: Jan 29 11:03 AM (1 of 23) Note that I have now added color and some alignment to the middle and to the right on this example.

CS 2316 Individual Homework 4 Greedy Scheduler (Part I) Due: Wednesday, September 18th, before 11:55 PM Out of 100 points

C++ Style Guide. 1.0 General. 2.0 Visual Layout. 3.0 Indentation and Whitespace

Advanced Programming Concepts. CIS 15 : Spring 2007

Tutorial. Activities. Code o o. Editor: Notepad Focus : Text manipulation & webpage skeleton. Open Notepad

VARIABLES. Aim Understanding how computer programs store values, and how they are accessed and used in computer programs.

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

Sucuri Webinar Q&A HOW TO IDENTIFY AND FIX A HACKED WORDPRESS WEBSITE. Ben Martin - Remediation Team Lead

Web API Lab. The next two deliverables you shall write yourself.

Perch Documentation. U of M - Department of Computer Science. Written as a COMP 3040 Assignment by Cameron McKay, Marko Kalic, Riley Draward

Our second exam is Monday, April 3. Note that it will not be possible to get all the homework submissions graded before the exam.

Dreamweaver Basics Workshop

Smart formatting for better compatibility between OpenOffice.org and Microsoft Office

Master Syndication Gateway V2. User's Manual. Copyright Bontrager Connection LLC

CSS worksheet. JMC 105 Drake University

CSE 341, Autumn 2005, Assignment 3 ML - MiniML Interpreter

Lab 03 - x86-64: atoi

COMP 3500 Introduction to Operating Systems Project 5 Virtual Memory Manager

The Big Python Guide

Assignment 5: Due Friday April 8 at 6pm Clarification and typo correction, Sunday Apr 3 3:45pm

CMPSCI 187 / Spring 2015 Sorting Kata

A PROGRAM IS A SEQUENCE of instructions that a computer can execute to

CS131 Compilers: Programming Assignment 2 Due Tuesday, April 4, 2017 at 11:59pm

Basic Computer Skills: An Overview

Learn Python In One Day And Learn It Well: Python For Beginners With Hands-on Project. The Only Book You Need To Start Coding In Python Immediately

LECTURE 08: INTRODUCTION TO HTML

If Statements, For Loops, Functions

CSE 143: Computer Programming II Spring 2015 HW2: HTMLManager (due Thursday, April 16, :30pm)

Assignment 1: Search

Transcription:

CIT 590 Homework 5 HTML Resumes Purposes of this assignment Reading from and writing to files Scraping information from a text file Basic HTML usage General problem specification A website is made up of an HTML page. This HTML page is simply a text file with a certain format. Taking advantage of this fact, one can take an HTML template and create multiple pages with different values stored. This is exactly what we are going to do in this HW. Many websites use these kinds of scripts to mass generate HTML pages from databases, but we will just think of our text file as our database. Our goal will be to take a resume that is a simple text file and convert it into an HTML file that can be displayed by any web browser. Is knowledge of HTML necessary to complete this assignment? You do not need to know much HTML to do this assignment. In class, I will cover the basics that might be helpful. The TAs will also go over some examples during the recitations, so going to the recitation can give you a little bit more of an appreciation for what this format is. You can also do the assignment by just understanding that HTML is the language that your browser understands. It is a tag-based language where each tag provides some basic information for how the browser renders the text that you write between them. For example, <h1>rado</h1> will be rendered by your browser as a heading (saying Rado ) with large font. <h1> is the beginning of the heading. And </h1> indicates the ending of a heading. An HTML webpage is typically divided into a head section and a body section. We have provided you with a basic website template. We want you to retain the head section and write only the body section via your Python program. For more details about HTML your best resources will be to show up for class/recitation or to use the w3schools websites which can be found at www.w3schools.com. That website has a ton of information. In any case, really all that you will need for this assignment is the tag documentation. Also, note that the first part of the assignment where you just have to read from a file has absolutely nothing to do with HTML. The input text file We will read a simple text file that is supposed to represent a student s resume. The resume has some key points but can be somewhat unstructured. In particular, the order of some of

the information can definitely be different for different people. But these are the things that you definitely DO know about the resume: Every resume will have a name which will be written at the top. The top line in the text file has to be just the name. There will be a line in the file which contains an email address. It will be a single line with just the email address and nothing else. Every resume will have a list of projects. Projects are listed line by line below a heading called Projects. So an example of what you will see in your resume files is Projects Worked on the big huge robot simulator Built a bridge across the Pacific Ocean Designed an efficient algorithm to sort arrays faster than Superman The list of projects ends with a single line that looks like ----------'. That is, it will have at least 10 minus signs. While this is a weird requirement; we are imposing it to actually make the assignment easier for you. So please go with it. Every resume will have a list of courses done. Courses are listed in the following format: Courses - CIT590, AB120 or Courses :- Ray Tracing, Making money by lying You are allowed to assume that every single resume will just have this commaseparated list of courses with the word Courses being there in front of them. You do want to allow for some kind of punctuation marks after the word Courses. So your program should be able to look at the above example and then extract the courses without including the -' sign or the ':-' or any such punctuation that is in between the word Courses and the actual data we care about. Every course begins with a letter of the English alphabet regardless of whether we are putting the course ID or the course name in our resume. Every resume will have one or more degrees. A degree is to be found in a single line in the text file that contains the word 'University' and also contains one of the words 'Bachelor', 'Master' or 'Doctor'. Example: Bachelor of Science in Business Administration - University of Python or Master of Technology in Chemical Engineering - University of Mumbai Be careful with this. In particular, a project done in a university should not be incorrectly considered a degree.

Your first step in this assignment is to create this type of file. We provide an example in the HW folder. Please do not just blindly copy our resume. Write your own. It does not have to be totally accurate but the HW is going to be more fun if you make it close to reality. Functions for reading You have to write one function for each piece of information that you want to extract from this text file. Contrary to previous assignments, in this assignment we will not tell you what to call the functions or what arguments to pass to them. What we will tell you is that you will be graded on what you call the functions, how modular your code is and most importantly whether you have unit tested your functions or not. Since the resume file is pretty small, these functions are best done by reading the file into memory and then parsing the file using list and string manipulations. Keep in mind that functions that perform input/output are difficult to test, so limit your file reading to a single function so that you can better unit test the rest of your functions. Detecting name This one is easy. Just extract the first line. The one extra thing we want you to do, just for practice, is to raise an error if the first character is not 'A' through 'Z'. So yes, in this case, your program will crash. Just provide a message saying that the first line has to be just the name with proper capitalization. Detecting emails For detecting emails here is what you need to do. 1. Look for a line that has the @' character. 2. Also make sure that the last four characters of the email are either '.com' or '.edu'. We want those characters in those specific cases. So.COM' and.edu' and all variants like that will not be allowed. 3. Make sure there is a string that begins with a normal lowercase English character between the '@' and the ending ('.com' or the '.edu'). These rules will work for rivanov@seas.upenn.edu but will not work for @rivanov or rivanov@python.org. We are fully aware that these rules are inadequate. However, we want you to use these rules and only these rules. PLEASE DO NOT GOOGLE FOR A FUNCTION FOR THIS. Googling for solutions to your homework is an act of academic dishonesty and in this particular case you will get solutions involving crazy regular expressions, which is a topic we have not talked about in class. In general, your code should never involve a topic that we have not touched in class at all.

Remember if you already have expert knowledge of Python, you should not be in this class anyway. Detecting courses Look for the word Courses in the file and then extract the line that contains that word. Then make sure you extract the correct courses. In particular any random punctuation after the word Courses and before the first actual course needs to be ignored. You are allowed to assume that every course begins with a letter of the English alphabet. Detecting projects Look for the word Projects in the file. Each subsequent line is a project, until you hit a line that looks like this ----------'. That is NOT an underscore. It is (at least) ten minus signs put together. Declare that you have reached the end of the projects section if and only if you see a line that has at least 10 minus signs one after another. If you detect a blank line between project descriptions, ignore that line. Detecting education Look for the word University or university and the words that are indicative of a degree. Grab any line that has those words. Please grab the entire line. Writing HTML Once you have gathered all the pieces of information, we want you to write it in HTML format. Here are the steps for that. Again, please put each one of these steps into a separate function so that you can debug and test it more efficiently. Save the file resume.html that is provided to you in the same directory as your code. Open that file in an editor that understands HTML. Note that there is empty <body>. We are going to go in and fill the body. In order to do this, you will need to write a function which opens the resume.html file that we have provided and then writes information to it bit by bit. Also since we want to write the body part of this HTML file, we want to leave the head section untouched. In order to do that, you will probably need the following code. Note that you will have to change the path so that it points to wherever you put your resume.html file. f = open('c:\users\rado\desktop\resume.html', 'r+') lines = f.readlines() f.seek(0) f.truncate() del lines[-1] del lines[-1] f.writelines(lines) There are a few lines over here that are worth explaining since we haven t covered all of this. f.seek(0) moves the file pointer back to the top of the file. f.truncate() deletes everything

from the file from that point onwards. The combination of those 2 lines effectively delete every single line in the file. Why are we doing this? Our file looks something like <html> <head> <title>resume</title> </head> <body> </body> </html> What we want to do is stuff our resume content within the body tags to make it look like this: <html> <head> <title>resume</title> </head> <body> all the resume content goes over here </body> </html> In order to write proper HTML we will need the following helper function: surround_block(tag, text). This is a function that surrounds some text in an HTML block. For example, something that writes <div> with some text inside it. Now break the writing of the resume into the following steps: 1. In order to format the resume a little bit we have to make sure that all the text is enclosed within the following lines: <div id="page-wrap"> Since we can only write things line by line we will write the first line now, then fill up the actual content in a few steps and write the final 3 lines (look at item 6 below) to close the HTML file properly. So this initial step just writes one line which is <div id="pagewrap">. 2. The basic information section looks like <div> <h1> Rado </h1> <p> Email: rivanov@seas.upenn.edu </p>

Think about how you can do that. In particular, think about how surround_block can be used to achieve this. Write a function to make this intro section of the resume and then write it out to a file (make sure your file writing is done in a separate function because it is hard to unit test). 3. For the education section you will have to produce output similar to the following. Assume that you have got the data about the degrees by reading through the original file and stored that in some data structure (decide whether you want to use list, set, tuple or dictionary). Note that the <ul> tags contain an unordered list, where the <li> tags contain a list item. <div> <h2>education</h2> <ul> <li> Masters in Java </li> <li> Bachelors in Typing </li> </ul> 4. The projects section is actually quite similar to the education section. The only difference is that since project descriptions can be more verbose we want to wrap everything in a paragraph tag <p>. For an example, here is the kind of html that we want you to generate for the projects section. Write a function for doing the projects section. The arguments for this section should take in a list of projects. <div> <h2> Projects </h2> <ul> <li> <p> Worked on the foreground background segmentation problem, which is a classic problem in computer vision. </p> </li> <li> <p> Compiled data on grade inflation. Published several papers on this topic in the journal of ivy league humor. </p>

</li> </ul> Do not worry about the indentation. 5. For the courses section, we just want to list the courses. We do not want to create bullet points. We also want the heading to be a bit smaller in size. So here is an example of generated html. Write a function that takes in courses (you can decide whether you want a list of courses or a string of courses) and then writes out the html of the form below into the file. <div> <h3> Courses </h3> <span> Algorithms, Forex training </span> 6. After writing all this content we just need to remember to close the HTML tags. To do this, we need to write the following 3 lines to the file: </body> </html> Formatting your HTML file If you are interested in converting your HTML file into more readable HTML, you can use the following website: http://www.dirtymarkup.com/. Just copy your HTML and hit clean and you will obtain cleanly formatted code. Note that this will not affect your grade in any way. Suggestion for working in pairs This is an assignment where there is a natural split for people to work in a pair. One person can scrape the text file for the information. The other person can write the functions that will write to an HTML file. If you work in pairs, please indicate both people in the header of your python file as well as in the Canvas comment section. One Canvas submission per pair is enough. Evaluation You will get evaluated on the following dimensions. Please note that we are happy to give you feedback about your design in office hours. It is important that you understand that in this assignment there is more to it than just getting your code to work.

1. Does your program successfully convert a resume text file into a resume HTML file? 2. Does that HTML file get rendered in a browser? Test it out in at least 2 different browsers (hopefully everyone has multiple browsers on their computer these days). 3. Did you sufficiently test your program? 4. How did you approach reading from and writing to files? Since we do not give you specific functions, we need to evaluate your design. The points over here will be a combination of your style, your modularity, the kind of functions you chose write, the documentation for each function, etc. Submission and due date (not a Tuesday!) Write all your Python code in a file called makewebsite.py. Write your unit tests in a file called makewebsite_tests.py. Finally, submit both the text file that you used to make your resume and the HTML file that got generated. Call them resume.txt and resume.html, respectively. We do not need a nicely formatted HTML file. We just need the file that your program produced. Also, if we need any other files to run your unit tests, please submit them as well. Zip all your files and turn them in to Canvas before 11:59pm Thursday, October 19.