The Web: Concepts and Technology. 1 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

Similar documents
The Web: Concepts and Technology. January 15: Course Overview

Web 2.0: Is it a Whole New Internet?

Ali Kamandi Spring 2007 Sharif University of Technology

CS/MAS 115: COMPUTING FOR THE SOCIO-TECHNO WEB HISTORY OF THE WEB

CS 345A Data Mining Lecture 1. Introduction to Web Mining

History and Backgound: Internet & Web 2.0

Introduction April 27 th 2016

Internet Client-Server Systems 4020 A

Web 2.0 For the Rest of Us. Joshua Porter Director of Web Development User Interface Engineering

Web 2.0 Tutorial. Jacek Kopecký STI Innsbruck

Semantic Web and Web2.0. Dr Nicholas Gibbins

Crawler. Crawler. Crawler. Crawler. Anchors. URL Resolver Indexer. Barrels. Doc Index Sorter. Sorter. URL Server

Chapter Topics. The History of the Internet. Chapter 7: Computer Networks, the Internet, and the World Wide Web

COV885 Distributed Systems

Technology in Action Complete, 13e (Evans et al.) Chapter 3 Using the Internet: Making the Most of the Web's Resources

The Internet, the Web, and Electronic Commerce The McGraw-Hill Companies, Inc. All rights reserved.

The Internet and World Wide Web. Chapter4

Page 1 AideRSS

Module 1: Internet Basics for Web Development (II)

Searching. Outline. Copyright 2006 Haim Levkowitz. Copyright 2006 Haim Levkowitz

Part 3: Online Social Networks

Speaker Pages For CoMeT System

Machine Learning Applications to Modeling Web Searcher Behavior Eugene Agichtein

Web Information System. Truong Thi Dieu Linh, PhD Nguyen Hong Phuong, Msc.

Modeling Information Seeking Behavior in Social Media Eugene Agichtein

CS40024: Internet Technology

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.

!!!!!! Digital Foundations

Web 2.0, AJAX and RIAs

Web 2.0, Social Programming, and Mashups (What is in for me!) Social Community, Collaboration, Sharing

INFS 321 Information Sources

Website Name. Project Code: # SEO Recommendations Report. Version: 1.0

Thursday, 26 January, 12. Web Site Design

Discovering Computers Your Interactive Guide to the Digital World

My Moodle Profile. Edit Your Profile

Applikationen im Browser Webservices ohne Grenzen

For more information about how to cite these materials visit

mediax STANFORD UNIVERSITY

SEO. Definitions/Acronyms. Definitions/Acronyms

Relevant?!? Algoritmi per IR. Goal of a Search Engine. Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Web Search

Setting up Flickr. Open a web browser such as Internet Explorer and type this url in the address bar.

Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search

09-Jan-17. Objectives Overview. The Internet. Objectives Overview. Evolution of the Internet. Evolution of the Internet. Discovering Computers 2012

Firespring Analytics

A Brief Evolution of the Web Technology & Design

Digital Research Strategies. Poynter. Essential Skills for the Digital Journalist II Kathleen A. Hansen, University of Minnesota October 15, 2009

AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES

Search Marketing 101 CCT332

Security and Authentication

Information Retrieval Spring Web retrieval

Library 2.0 for Librarians

Internet Basics. Basic Terms and Concepts. Connecting to the Internet

NBA 600: Day 15 Online Search 116 March Daniel Huttenlocher

2nd Year PhD Student, CMU. Research: mashups and end-user programming (EUP) Creator of Marmite

Local Area Networks; Ethernet

Discovering Computers Chapter 2 The Internet and World Wide Web

HTML 5 and CSS 3, Illustrated Complete. Unit M: Integrating Social Media Tools

Social Media Tools. March 13, 2010 Presented by: Noble Studios, Inc.

Building Your Blog Audience. Elise Bauer & Vanessa Fox BlogHer Conference Chicago July 27, 2007

Internet Search. (COSC 488) Nazli Goharian Nazli Goharian, 2005, Outline

Web 2.0. Agenda. What you will need to have handy for this class. Social Software Applications for Libraries. Day 1. Day 2

VISUAL SUMMARY ACCESS INTERNET AND WEB. The Internet, the Web, and Electronic Commerce

Technology In Action, Complete, 14e (Evans et al.) Chapter 3 Using the Internet: Making the Most of the Web's Resources

OU Campus Training. Web Services Unit

Internet and World Wide Web. The Internet. Computers late 60s & 70s. State of computers? Internet s. Personal Computing?

Searching the Web for Information

introduction to using the connect community website november 16, 2010

Introduction to Web Technologies

Introduction to Web 2.0 Data Mashups

Course Outline. Module 1: SharePoint Overview

Utilizing Folksonomy: Similarity Metadata from the Del.icio.us System CS6125 Project

Anatomy of a search engine. Design criteria of a search engine Architecture Data structures

Role of Social Media and Semantic WEB in Libraries

KS Blogs Tutorial Wikipedia definition of a blog : Some KS Blog definitions: Recommendation:

The internet What is it??

Advertising Network Affiliate Marketing Algorithm Analytics Auto responder autoresponder Backlinks Blog

Setting up your Netvibes Dashboard Adding a Blog to your Dashboard

Web Architecture Review Sheet

SOCIAL LOGIN FOR MAGENTO 2

Classroom Blogging. Training wiki:

Dive Into Web 2.0 (In Chapter 3) Part One

Demystifying SEO for Government Agencies

Hey Guys, My name is Piyush Mathur. By Profession I am a Digital marketing consultant.

Running Head: HOW A SEARCH ENGINE WORKS 1. How a Search Engine Works. Sara Davis INFO Spring Erika Gutierrez.

TRACKING YOUR WEBSITE WITH GOOGLE ANALYTICS CHRIS EDWARDS

28 JANUARY, Updating appearances. WordPress. Kristine Aa. Kristoffersen, based on slides by Tuva Solstad and Anne Tjørhom Frick

Networks and Distributed Systems

Unit 5: Computer Networking CS 101, Fall 2018

Glossary of on line marketing terms

DP Project Development Pvt. Ltd.

Understanding how searchers work is essential to creating compelling content and ads We will discuss

Introduction to the Internet and World Wide Web

Planning for the digital natives

Getting Started Guide. Getting Started With Quick Blogcast. Setting up and configuring your blogcast site.

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague

The Ultimate Digital Marketing Glossary (A-Z) what does it all mean? A-Z of Digital Marketing Translation

Chapter 2 The Internet and World Wide Web

What every CXO should know about Web 2.0

Social Media and Web 2.0. The Social Media and Web 2.0 webinar will begin shortly.

Transcription:

The Web: Concepts and Technology January 15: Course Overview 1 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

Today s Plan Who am I? What is this course about? Logistics Who are you? 2 Eugene Agichtein CS 190: The Web: Concepts and Technology, Emory University Spring 2009

Who am I: Background Sept 2006-: Assistant Professor in the Math & CS department Affiliate Faculty, Linguistics Affiliate Faculty, Web Science @ Georgia Tech Summer 2007: Visiting Researcher at Yahoo! Research 2004 to 2006: Postdoctoral Researcher at Microsoft Research Text Mining, Search, and Navigation group, and MSN Search/Live 1998-2004: Ph.D. in Computer Science from Columbia University: dissertation on extracting structured relations from web-scale document repositories 1994-1998: 1998: B.S. in Engineering from The Cooper Union. 3 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

Research: Developing Intelligent Systems to Help People e Find Information o Online Search, browsing behavior User-generated content, social networks Human cognitive processes

Intelligent Information Access Lab http://ir.mathcs.emory.edu/ Information retrieval & extraction, text & data mining Web search user behavior, social networks, social media Ryan Kelly, Emory 10 Walt Askew, Emory 09 Abulimiti Aji, 1 st Year Ph.D Qi Guo, Yandong Liu, Alvin Grissom, 2 nd year Ph.D 2 nd year Ph.D 2 nd year MS External collaborations: Emory Libraries: Selden Deemer, Arthur Murphy Psychology: Phil Wolff Neuroscience: Beth Buffalo School of Medicine: Ernie Garcia And colleagues atyahoo! Research, Microsoft Research, Motorola, and GeorgiaTech

Course Outline Web history and infrastructure Web Search and Browsing Applications: E-commerce, advertising Abuse: spam, hacking and the gray areas Web services Recommender systems Online social networks Online collaboration Other topics: will depend on your interest! 6 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

What is the Internet? t? The largest network of networks in the world. Uses TCP/IP protocols and packet switching. Runs on any communications substrate. From Dr Vinton Cerf From Dr. Vinton Cerf, Co-Creator of TCP/IP

Structure of the Internet 8 Eugene Agichtein CS 190: The Web: Concepts and Technology, Emory University Spring 2009

Bi Brief fhistory of fthe Internet t 1968 - DARPA (Defense Advanced Research Projects Agency) contracts with BBN (Bolt, Beranek & Newman) to create ARPAnet 1970 - First five nodes: UCLA Stanford UC Santa Barbara U of Utah, and BBN 1974 -TCP specification by Vint Cerf 1984 On January 1, the Internet with its 1000 hosts converts en y masse to using TCP/IP for its messaging

Graph mining

Web Link Structure and Web Search Browsing can t find these pages Need a search engine Bow Tie Structure Broder et al 2000

Web Search: Google 1997 2000 12 Eugene Agichtein CS 190: The Web: Concepts and Technology, Emory University Spring 2009

Google Architecture URL Server - sends lists of URLs to crawlers Crawler - downloads web pages Store Server - compresses & stores web pages into the repository Indexer - reads the repository & uncompresses the documents - parses the documents - creates forward index - parses out the links URL Resolver - converts relative URLs to absolute URLs and then to docids - generates a database of links - puts the anchor text into the barrels Sorter - generates the inverted index g Searcher - answers queries

Web Search: Google (continued) 2001 2007 14 Eugene Agichtein CS 190: The Web: Concepts and Technology, Emory University Spring 2009

15 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

16 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

17 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

18 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

Users learn to ignore ads! Heat map: Detect gaze position and duration using eye tracking Box Blindness 19 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

20 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

21 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

This was surface web

The Invisible, Deep, or Hidden Web Web sites or information that Google or other popular search engines are not fully indexing Websites specifically excluded by the search engine

24 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

25 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

26 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

27 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

Web 2.0: It s Hard to Define, But I Know it When I See it Web Services / API s Emerging Tech Folksonomies / Content tagging g AJAX RSS Some Apps You may Know Flickr Google Maps Blogging & Content Syndication Craigslist Know Facebook, Linkedin, Tribes, Ryze, Friendster Del.icio.us Upcoming.org 43Things.com "[This is] not my mom's Internet It's changing, and it's changing because we're looking at the share-shifting the the time people are looking at TV, reading a magazine, listening to the radio they're not replacing each other; they're coming together." - AOL Exec / May 2005 Major Retailers Amazon API s Google Adsense API Yahoo API Ebay API

Web 2.0: Evolution Towards a Read/Write Platform Web 1.0 (1993-2003) Web 2.0 (2003- beyond) Pretty much HTML pages viewed through a browser Web pages, pg plus a lot of other content shared over the web, with more interactivity; more like an application than a page Read Mode Write & Contribute Page Primary Unit of Post / record content static State dynamic Web browser Viewed through Browsers, RSS Readers, anything Client Server Architecture Web Services Web Coders Content Created by Everyone geeks Domain of mass amaturization

Recommendations Search Recommendations Items Products, web sites, blogs, news items, 30 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

Well-known recommender systems: Amazon and Netflix 31 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

Recommendation Types Editorial Simple aggregates Top 10, Most Popular, Recent Uploads Tailored to individual users Amazon, Netflix, 32 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

The Long Tail CS 584: Information Retrieval. Math & Computer 33 Source: Chris Anderson (2004) Science Department, Emory University

Netflix Challenge 34 CS 584: Information Retrieval. Math & Computer Science Department, Emory University