CSC 380/530 Advanced Database Take-Home Midterm Exam (document version 1.0) SQL and PL/SQL

Similar documents
Course and Contact Information. Course Description. Course Objectives

Course and Contact Information. Course Description. Course Objectives

CSCI 4210 Operating Systems CSCI 6140 Computer Operating Systems Project 2 (document version 1.4) CPU Scheduling Algorithms

CSC343 Fall 2007 Assignment 2 SQL and Embedded SQL

Sql Server Check If Global Temporary Table Exists

MET CS 669 Database Design and Implementation for Business Term Project: Online DVD Rental Business

(Worth 50% of overall Project 1 grade)

Connecture Platform Manager

Unit Assessment Guide

San José State University Computer Science Department CS157A: Introduction to Database Management Systems Sections 5 and 6, Fall 2015

CSC 4710 / CSC 6710 Database Systems. Rao Casturi

INF 315E Introduction to Databases School of Information Fall 2015

Prescribing Services. EMIS Web Initial Extraction Guide

CS 1520 / CoE 1520: Programming Languages for Web Applications (Spring 2013) Department of Computer Science, University of Pittsburgh

San Jose State University College of Science Department of Computer Science CS185C, Introduction to NoSQL databases, Spring 2017

INFO 1103 Homework Project 2

Structure and Interpretation of Computer Programs Spring 2017 Mock Midterm 1

Course Requirements. Prerequisites Miscellaneous

CS157a Fall 2018 Sec3 Home Page/Syllabus

Mobile Forms Integrator

Our second exam is Thursday, November 10. Note that it will not be possible to get all the homework submissions graded before the exam.

Problem Description Earned Max 1 CSS 20 2 PHP 20 3 SQL 10 TOTAL Total Points 50

Aplia Instructor Brief Start Guide

Programming Standards: You must conform to good programming/documentation standards. Some specifics:

Computer Science 597A Fall 2008 First Take-home Exam Out: 4:20PM Monday November 10, 2008 Due: 3:00PM SHARP Wednesday, November 12, 2008

Creating the Data Layer

Homework 3 - Dumb Notes

CS450 - Database Concepts Fall 2015

EECE.2160: ECE Application Programming

CMN192B OFFICE: An Overview of Access and PowerPoint

Contents 1. OVERVIEW GUI Working with folders in Joini... 4

CS 3030 Scripting Languages Syllabus

ACCUPLACER Placement Validity Study Guide

CSC 261/461 Database Systems. Fall 2017 MW 12:30 pm 1:45 pm CSB 601

Once you see the screen below click on the MESA Dropbox

CS 2223 B15 Term. Homework 1 (100 pts.)

Database Concepts. Sixth Edition. David M. Kroenke David J. Auer. Instructor s Manual. Prepared by David J. Auer APPENDIX B

Vocational Gateway User guide for Tech Level centres

CS 3030 Scripting Languages Syllabus

1 (10) 2 (8) 3 (12) 4 (14) 5 (6) Total (50)

Barchard Introduction to SPSS Marks

CS 2110 Fall Instructions. 1 Installing the code. Homework 4 Paint Program. 0.1 Grading, Partners, Academic Integrity, Help

HEAP Online User Guide

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

Practicing on word processor, spreadsheet, search engines, citation and referencing, and creating zipped files.

Submitting Assignments

Transfer Records to the State Archives with Exactly

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

Turnitin Software Professional Development

Your (printed!) Name: CS 1803 Exam 3. Grading TA / Section: Monday, Nov. 22th, 2010

San Jose State University College of Science Department of Computer Science CS185C, NoSQL Database Systems, Section 1, Spring 2018

CS 4800: Algorithms & Data. Lecture 1 January 10, 2017

COURSE SYLLABUS BMIS 326 INTRODUCTION TO ORACLE

MARKING CANVAS ASSIGNMENTS OFFLINE (INCLUDING MARKING ANONYMOUSLY)

Getting Started with Moodle 2.0

COMP 401 COURSE OVERVIEW

Prof. David Yarowsky

Part 1: Collecting and visualizing The Movie DB (TMDb) data

15-780: Problem Set #2

Creating and Manipulating Grade Scheme

Create SafeAssignment! 2. Viewing & Marking Submitted Papers! 4. Using DirectSubmit! 6. How to Submit Papers through DirectSubmit!

Database Management Systems,

15-110: Principles of Computing, Spring 2018

Assignment Submission HOWTO

Part 2: Programming Questions

CS44800 Practice Project SQL Spring 2019

Access Intermediate

Managing CSC within SoftPro 360

Roxen Content Provider

Programming Project 5: NYPD Motor Vehicle Collisions Analysis

Lecture 27: Learning from relational data

IBM DB2 Query Patroller. Administration Guide. Version 7 SC

Student Website / Portal Guidelines

Database Security MET CS 674 On-Campus/Blended

Creating databases using SQL Server Management Studio Express

Problem Set 6 Due: 11:59 Sunday, April 29

CS31 Discussion 1E. Jie(Jay) Wang Week1 Sept. 30

Oracle Adapter for Salesforce Lightning Winter 18. What s New

The Rapid Inquiry Facility (RIF)

Rochester Institute of Technology. Making personalized education scalable using Sequence Alignment Algorithm

RESPONSIVE SOLUTIONS, INC. CUSTOMER+ ADMIN MANUAL

Overview. Task A: Getting Started. Task B: Adding Support for Current Time

Overview of the Class and Introduction to DB schemas and queries. Lois Delcambre

Importing to WIRED Contact From a Database File. Reference Guide

Copy Data From One Schema To Another In Sql Developer

Alyssa Grieco. Data Wrangling Final Project Report Fall 2016 Dangerous Dogs and Off-leash Areas in Austin Housing Market Zip Codes.

CS 101, Spring 2016 March 22nd Exam 2

Get Table Schema In Sql Server 2005 Modify. Column Datatype >>>CLICK HERE<<<

McGill University School of Computer Science COMP-202A Introduction to Computing 1

COSC-211: DATA STRUCTURES HW5: HUFFMAN CODING. 1 Introduction. 2 Huffman Coding. Due Thursday, March 8, 11:59pm

CS Final Exam Review Suggestions - Spring 2018

SYSTEM CODE COURSE NAME DESCRIPTION SEM

CIT 590 Homework 5 HTML Resumes

a f b e c d Figure 1 Figure 2 Figure 3

Family Map Server Specification

Short Answer Questions (40 points)

Managing Your Database Using Oracle SQL Developer

Researcher information

Lesson 13 Transcript: User-Defined Functions

15-110: Principles of Computing, Spring 2018

Transcription:

CSC 380/530 Advanced Database Take-Home Midterm Exam (document version 1.0) SQL and PL/SQL The take-home midterm exam is due by 11:59:59 PM on Thursday, November 5, 2015 and must be submitted electronically. The take-home midterm exam will count as 20% of your final course grade. The take-home midterm exam is to be completed individually. Do not share your work with anyone else. No late submissions will be accepted. Getting Started Download the midterm-sql.zip file from the course website, then execute the scripts within your Oracle environment by executing them in this order: midterm.sql, midterm-data.sql, insert-words.sql, insert-counts.sql. These scripts create and populate four tables (shown in the database diagram below), i.e., m_books, m_counts, m_wordcounts, and m_words. The m_wordcounts table represents a many-to-many relationship between m_books and m_words, meaning that a given word in the m_words table may appear one or more times (as indicated by the m_wordcounts.count field) in one or more books. Browse through these tables to understand them, emailing me with any questions. Note that the data in these table was obtained programmatically (via Python) from publicly available texts of classic works (all from http://bartleby.com/).

General SQL Queries Write SQL queries to answer the questions below. Note that your SQL code should be as generalpurpose as possible (i.e., assume that the sample tables could have many more rows than what has been provided). 1. How many unique words are there across all books? Note that to be unique, a word must only appear once. 2. How many unique words are there per book? Design your query to display the results ordered alphabetically by book title. 3. What are the top eight most frequently occurring words across all books? Design your query to display the results in order starting from the most frequently occurring word. 4. How many unique words are there per author? How many unique words are there per editor? Design your query to combine the above results and display the results ordered alphabetically by author/editor name. 5. What is the percentage of word counts to total unique words per book? For example, if a word occurs 47 times in a work that contains 4700 unique words, the percentage for this word is 1%. SQL Queries to Handle Stopwords A stopword is a word that search engines often ignore. Typical stopwords include: the; of; and; to; or; not; that; there; and so on. From question 3 above, you probably found a lot of these stopwords. In this section, we will work further with stopwords to improve the query results of question 3. Write SQL creation scripts and queries as specified below. As above, your SQL code should be as general-purpose as possible. 6. Write a creation script for a new table called m_stopwords that has a unique id field and a foreign key wordid field that references the m_words table. 7. Write a single SQL query that inserts rows into the (assumed-to-be-empty) m_stopwords table by selecting the top 20 most frequently occurring words across all books. 8. Taking stopwords from the m_stopwords table into account, write an SQL query that selects the top 20 most frequently occurring words across all books, i.e., ignore stopwords. Be sure you do not delete or change any of the existing tables or their data (i.e, do not simply delete stopwords from m_wordcounts). 2

Oracle PL/SQL Functions and Procedures Write PL/SQL functions and procedures as specified below. 9. Create a PL/SQL function called generatepassword() that generates (and returns) a random password based on the words data. Generated passwords have the following general format: <word-1><integer><word-2><integer>...<integer><word-n> This function must take two integer inputs. The first input is n, which specifies the number of words to use in the generated password. More specifically, your function must randomly select n words that are unique across all books in the database (ignoring stopwords). Also note that a generated password must contain exactly n unique words (i.e., no duplicate words in a generated password). The second input is m, which specifies the number of digits the randomly generated integers should be between each word. As an example, if n is 4 and m is 2, possible randomly generated passwords are: requiem97heather59switzerland83baboon distinguishing21sharpens18zephyrs81caves influenced12palms19newark45hosting 3

10. Create a PL/SQL procedure called displayfrequencyreport() that displays a summary report for either a given book or all data within the database. The input to this procedure is bookid, a key to the m_books table. If the bookid input is 0, then data for all books should be displayed. In your summary report, display the word count for each word. Note that words with the same word count should be grouped together in your report. Therefore, also display the number of words in each group (i.e., the number of words with the given word count). Since the number of words in a group might be very large, display at most four of the words in each group. Order your output by word count, from highest to lowest. Your procedure must output the following (using the format shown here): Data analysis report as of <current-date> Unique words: 5717 Word Count # Words in Group Words (at most four) ------------ ------------------ --------------------------------- 1822 1 the 1258 1 of 1102 1 and 607 1 to 572 1 his 548 1 in 463 1 he 360 1 was 282 1 with 266 1 that 215 1 it 184 2 is, by 181 1 at 174 1 as 168 1 on 160 3 but, which, for 152 1 had......... 3 405 autumnal, leaves, gleam, mended 2 906 pound, limb, done, homeward 1 3281 groan, beautiful, resist, extra ------------ ------------------ --------------------------------- Note that the above data is based on a different dataset (so use it only as an example). 4

Submission Instructions To submit your work, create a single ZIP file (or compressed folder) containing all of your source files, including the code used to generate the SQL insert statements for the zipcodes table. Use your Saint Rose ID (e.g., goldschmidtd168) as the name of the ZIP file (i.e., goldschmidtd168.zip). Though entirely optional, you can include a simple README.txt file with notes or instructions. Email your ZIP file to goldschmidt@gmail.com (with a subject of CSC 380/530 Midterm Exam ). 5