COSC-589 Web Search and Sense-making Information Retrieval In the Big Data Era. Spring Instructor: Grace Hui Yang

Similar documents
Math 2280: Introduction to Differential Equations- Syllabus

CSC 261/461 Database Systems Lecture 26. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

CSC 261/461 Database Systems Lecture 24. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

USC Viterbi School of Engineering

Introduction to UNIX

Wayne State University Department of Computer Science CSC 5991: Advanced Web Technologies. Functional (Scala) Programming for the Web.

SYLLABUS. 3. Total estimated time (hours/semester of didactic activities) 3.1 Hours per week 3 Of which: 3.2 course seminar/laboratory1 sem

ESET 369 Embedded Systems Software, Fall 2017

In this course, you need to use Pearson etext. Go to "Pearson etext and Video Notes".

CSci 4211: Data Communications and Computer Networks. Time: Monday and Wednesday 1 pm to 2:15 pm Location: Vincent Hall 16 Spring 2016, 3 Credits

San Jose State University College of Science Department of Computer Science CS185C, Introduction to NoSQL databases, Spring 2017

Syllabus COSC-051-x - Computer Science I Fall Office Hours: Daily hours will be entered on Course calendar (or by appointment)

COMP 401 COURSE OVERVIEW

ESET 369 Embedded Systems Software, Spring 2018

CMPS 182: Introduction to Database Management Systems. Instructor: David Martin TA: Avi Kaushik. Syllabus

Database Systems Management

Course and Contact Information. Course Description. Course Objectives

ECE 467 Section 201 Network Implementation Laboratory

Compilers for Modern Architectures Course Syllabus, Spring 2015

CPS352 - DATABASE SYSTEMS. Professor: Russell C. Bjork Spring semester, Office: KOSC 242 x4377

CSci 4211: Introduction to Computer Networks. Time: Monday and Wednesday 2:30 to 3:45 pm Location: Smith Hall 231 Fall 2018, 3 Credits

Computer Technology Division. Course Syllabus for: COMT Spring Instructor: Joe Bolen

In this course, you need to use Pearson etext. Go to "Pearson etext and Video Notes".

ECE573 Introduction to Compilers & Translators

CSc 2310 Principles of Programming (Java) Jyoti Islam

COURSE SYLLABUS ****************************************************************************** YEAR COURSE OFFERED: 2015

Oklahoma State University Institute of Technology Online Common Syllabus Spring 2019

CSCI 201L Syllabus Principles of Software Development Spring 2018

Internet Web Technologies ITP 104 (2 Units)

CS 572: Information Retrieval. Lecture 1: Course Overview and Introduction 11 January 2016

Syllabus Revised 01/03/2018

SECURITY ANALYSIS FINC 4660

CPS352 - DATABASE SYSTEMS. Professor: Russell C. Bjork Spring semester, Office: KOSC 242 x4377

Object-Oriented Programming for Managers

IT 341 Fall 2017 Syllabus. Department of Information Sciences and Technology Volgenau School of Engineering George Mason University

CPS352 Database Systems Syllabus Fall 2012

CMPE012 Computer Engineering 12 (and Lab) Computing Systems and Assembly Language Programming. Summer 2009

ESET 349 Microcontroller Architecture, Fall 2018

Syllabus Revised 08/21/17

Course and Contact Information. Course Description. Course Objectives

San Jose State University College of Science Department of Computer Science CS185C, NoSQL Database Systems, Section 1, Spring 2018

CS 3030 Scripting Languages Syllabus

B. Subject-specific skills B1. Problem solving skills: Supply the student with the ability to solve different problems related to the topics

TEACHING & ASSESSMENT PLAN

COSC 115A: Introduction to Web Authoring Fall 2014

EECE.2160: ECE Application Programming Spring 2017

Introduction to CS 4604

2/4/2019 Week 3- A Sangmi Lee Pallickara

University At Buffalo COURSE OUTLINE. A. Course Title: CSE 487/587 Information Structures

San José State University Computer Science Department CS157A: Introduction to Database Management Systems Sections 5 and 6, Fall 2015

Big Sandy Community and Technical College. Course Syllabus

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

CSE 114, Computer Science 1 Course Information. Spring 2017 Stony Brook University Instructor: Dr. Paul Fodor

IST659 Spring 2016 Huang Syllabus Data Administration Concepts and Database Management

Programming Languages CSCE

HCI-4/631 Software Architectures for User Interfaces, Fall 2006

CSCI 434 INTRODUCTION TO LOCAL AREA NETWORKS (SPRING 2015)

Operating Systems, Spring 2015 Course Syllabus

Programming 2. Outline (112) Lecture 0. Important Information. Lecture Protocol. Subject Overview. General Overview.

COLLEGE OF DUPAGE CIS 2542 Advanced C++ with Data Structure Applications Course Syllabus

Cleveland State University

CS 3270 Mobile Development for Android Syllabus

IST659 Spring2015 M001 Wang Syllabus Data Administration Concepts and Database Management

TEACHING & ASSESSMENT (T & A) PLAN College of Economics Management and Information Systems Department of Information Systems

VE281 Data Structures and Algorithms. Introduction and Asymptotic Algorithm Analysis

Course Syllabus. Course Information

GPU Programming and Architecture: Course Overview

CMPUT 391 Database Management Systems. Fall Semester 2006, Section A1, Dr. Jörg Sander. Introduction

CIS 3308 Web Application Programming Syllabus

INF 315E Introduction to Databases School of Information Fall 2015

Central Washington University Department of Computer Science Course Syllabus

/ Cloud Computing. Recitation 2 January 19 & 21, 2016

CS 1044: Introduction to Programming in C++

COSC 115: Introduction to Web Authoring Fall 2013

CSCI 6312 Advanced Internet Programming

Office Hours Time: Monday/Wednesday 12:45PM-1:45PM (SB237D) More Information: Office Hours Time: Monday/Thursday 3PM-4PM (SB002)

CS GPU and GPGPU Programming Lecture 1: Introduction. Markus Hadwiger, KAUST

Web Programming Spring 2010

CSE 167: Introduction to Computer Graphics. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2016

CS 3030 Scripting Languages Syllabus

Oklahoma State University Institute of Technology Face-to-Face Common Syllabus Fall 2017

Linear Algebra Math 203 section 003 Fall 2018

Lecture 27: Learning from relational data

CARTO UNIVERSITY GROUP. Syllabus GEO 445/545 Computer-assisted Cartography Winter December 18, 2013

San Jose State University College of Science Department of Computer Science CS151, Object-Oriented Design, Sections 1, 2, and 3, Spring 2018

Data Mining. Jeff M. Phillips. January 12, 2015 CS 5140 / CS 6140

San José State University College of Science / Department of Computer Science Introduction to Database Management Systems, CS157A-3-4, Fall 2017

KOMAR UNIVERSITY OF SCIENCE AND TECHNOLOGY (KUST)

Course Name: Database Design Course Code: IS414

San José State University Computer Science Department CS49J, Section 3, Programming in Java, Fall 2015

TEACHING & ASSESSMENT (T & A) PLAN

San Jose State University College of Science Department of Computer Science CS151, Object-Oriented Design, Sections 1,2 and 3, Spring 2017

Oklahoma State University Institute of Technology Face-to-Face Common Syllabus Fall 2017

TEACHING & ASSESSMENT PLAN

Welcome to CompSci 201

CS 378 Big Data Programming

Info Sys 422/722 & ISyE 722. Computer Based Data Management. Fall, 2016

San José State University Department of Computer Science CS 166 / SE 166, Information Security, Section 4, spring, 2017

CS535: Interactive Computer Graphics

CS 245: Database System Principles

Transcription:

COSC-589 Web Search and Sense-making Information Retrieval In the Big Data Era Spring 2016 Instructor: Grace Hui Yang The Web provides abundant information which allows us to live more conveniently and make quicker decisions. At the same time, the growth of the Web and the improvements in data creation, collection, and use have lead to tremendous increase in the amount and complexity of the data that a search engine needs to handle. The increase of the magnitude and complexity of the data has become a major drive for new data analytics algorithms and technologies that are scalable, highly interactive, and able to handle complex and dynamic information seeking tasks in the big data era. How to effectively and efficiently search for the documents relevant to our information needs and how to extract the valuable information and make sense out from big data are the subjects of this course. The course will cover Web search theory and techniques, including basic probabilistic theory, representations of documents and information needs, various retrieval models, link analysis, classification and recommender systems. The course will also cover programming models that allow us to easily distribute computations across large computer clusters. In particular, we will teach Apache Spark, which is an open-source cluster computing framework that has soon become the state-of-the-art for big data programming. The course is featured in step-by-step weekly/bi-weekly small assignments which composes a large big data project, such as building Google s PageRank on the entire Wikipedia. Students will be provided knowledge to Spark, Scala, Web search engines, and Web recommender systems with a focus on search engine design and "thinking at scale. Prerequisites: None. Course Website: https://piazza.com/georgetown/spring2016/cosc589/home Class Time: Monday and Wednesday 1100-1215 Venue: Intercultural Center 205 Blackboard: Homework submission Textbooks: Information Retrieval: Implementing and Evaluating Search Engines By Stefan Büttcher, Charles L. A. Clarke and Gordon V. Cormack Publisher: The MIT Press July 2010 Learning Spark: Lightning-Fast Big Data Analysis By Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia Publisher: O'Reilly Media

January 2015 Programming in Scala, Second Edition. A Comprehensive Step-by-Step Guide By Martin Odersky, Lex Spoon, and Bill Venners Publisher: Artima January 2011 Lecturer: Prof. Grace Hui Yang Email: gh243@georgetown.edu Office Phone: 202 687 6355 Website: http://www.cs.georgetown.edu/~huiyang Teaching Assistants: Jiyun Luo (jl1749@georgetown.edu) Website: http://cs-class.uis.georgetown.edu/~jl1749/homepage/ Office Hours: Prof. Grace Hui Yang St Mary s Hall 338 Tuesday 1-2pm TA hours St Mary s Hall TA room 330 Wednesday evening 6-7:30pm

Grading: Midterm Exam 15% Assignments 80% Pop Quizzes 5% Bonus - Optional Assignment Problems 20% Homework policy: All homework should be submitted through Blackboard. Homework is due by 11:59pm on the due date. Three late days in total are allowed without penalty for the entire semester. For instance, you may be late by 1 day for homework 1 and be late by 2 days for homework 2. Once the three-late-dates are used, you will be penalized according to the policy below: a penalty of 50% will be applied for homework submitted within the next 24 hours. zero credit will be assigned after that. All homework (even with zero credit) must be turned in order to pass the course. Integrity policy: All experimental results turned in must be true. No copying/cheating is allowed. Please check Georgetown's Honor system.

1/13/16 Introduction Spark Installation Chapter 1, 2 Assignment 1: Due 1/20/16 Install Spark 1/18/16 Break: Martin Luther King Day 1/20/16 Scala Crash Course Scala Book Ch 2,3,7,8 Assignment 2: Due 1/27/16 Scala practice Due: Assignment 1 1/25/16 No class 1/27/16 Scala Crash Course SBT Passing functions Scala Book Ch 2,3,7,8 Chapter 2 Assignment 3: Due 2/3/16 Scala practice Due: Assignment 2

2/1/16 RDD File Operations Chapter 3 Chapter 5 2/3/16 RDD Transformation Actions Spark UI Chapter 3 Chapter 7 Assignment 4: Due 2/10 Word Count Due: Assignment 3 2/8/16 RDD Transformation Actions Write SBT Build File Chapter 3 Chapter 2 2/10/16 RDD Persistence Regular Expression Chapter 3 Scala Book Section 26.7 Assignment 5: Due 2/17 Extract Patterns Due: Assignment 4

2/15/16 Break: President Day 2/17/16 Text Processing Bag of Words Query and Documents Vector Space Model IR Book Chapter 2 IR Book Chapter 3 Assignment 6: Due 3/2 Download Wikipedia Dump Clean the dataset Due: Assignment 5 2/22/16 Break: WSDM Week 2/24/16 Break: WSDM Week 2/29/16 Accumulators Broadcast Variables Chapter 6 3/2/16 XML Parser Scala Book Chapter 28 Assignment 7: Due 3/14 Parse the Wikipedia data using Scala.xml Word Count Due: Assignment 6 3/7/16 Spring Break 3/9/16 Spring Break 3/14/16 Midterm Review 3/16/16 Midterm Exam (in class, closed book) Due: Assignment 7

3/21/16 Break: ECIR 3/28/16 Easter Break 3/23/16 Easter Break 3/30/16 Web Search Link Graph IR Book Chapter 15 Assignment 8: Due 4/6 Build Wikipedia Link graph Write to a file 4/4/16 PageRank Key/Value Pairs Chapter 4 IR Book Chapter 15 4/6/16 PageRank IR Book Chapter 15 PageRank paper Assignment 9: Due 4/20 Compute PageRank for Wikipedia pages Due: Assignment 8 4/11/16 Break: WWW Conference 4/13/16 Break: WWW Conference

4/18/16 Web Knowledge Base Topic-Oriented PageRank IR Book Chapter 15 Papers 4/20/16 Recommender Systems Chapter 11 IR Book Section 10.1 Assignment 10: Due 5/2 Compute Topic-oriented PageRank for Wikipedia Due: Assignment 9 4/25/16 Regression & Classification Chapter 11 Papers 4/27/16 Advanced Search Papers Assignment 11 (optional): Due 5/14 Secret 5/2/16 (Class End) Conclusion 5/7/16 Study Week Due: Assignment 10 5/9/16 Exam Week 5/14/16 Exam Week Due: Assignment 11 See Sample Lecture Notes on the Next Page. It is also available at http://people.cs.georgetown.edu/~huiyang/cosc-589/intro.pdf