Web Dynamics. Part 1 - Introduction. 1.1 Dimensions of dynamics in the Web 1.2 Application examples. Summer Term 2010 Web Dynamics 1-1

Size: px
Start display at page:

Download "Web Dynamics. Part 1 - Introduction. 1.1 Dimensions of dynamics in the Web 1.2 Application examples. Summer Term 2010 Web Dynamics 1-1"

Transcription

1 Web Dynamics Part 1 - Introduction 1.1 Dimensions of dynamics in the Web 1.2 Application examples Summer Term 2010 Web Dynamics 1-1

2 From Wikipedia: WhyWeb Dynamics? In physics the term dynamicscustomarily refers to the time evolution of physical processes. Summer Term 2010 Web Dynamics 1-2

3 Whichaspectsof theweb aredynamic? Size: sites/pages added and deleted all the time Summer Term 2010 Web Dynamics 1-3

4 Numberof siteson theweb 1998: 2,636,000 (IP addresses with HTTP server) 1999: 4,662, : 7,128,000, ~40% public, 40% dead 2001: 8,443, : 8,712, : 109 million sites(netcraft) 2007: 433 million hosts on Internet (ISC) : Summer Term 2010 Web Dynamics 1-4

5 Sizeestimatesforthe(indexable) Web 1995: ~11.4 million docs(bray) 1997: ~200 million docs(bharat&broder) (sampling based on Hotbot, Altavista, Excite and Infoseek, overlap ~2%) 1998: >800 million docs(lawrence&giles) January 2005: 11.5 billion docs(gulli&signorini) (sampling based on Google, MSN, Yahoo! and Ask/Teoma) 2005: 19.2 billion documents in Yahoo! index 2008: >1 trilliondocumentscountedbygoogle Summer Term 2010 Web Dynamics 1-5

6 Moresizeestimates Estimates based on overlap of search engine results (from [We will discuss this technique later in the course] Summer Term 2010 Web Dynamics 1-6

7 TheWeb isinfinite and growing Non-indexable Web not seen by search engines ( Deep Web behind forms): est. 550 billiondocs, est. 7.5 petabytes in 2000 (Bright Planet) User-generatedcontent(socialnetworks, communities, wikis, blogs, ) Pages created on demand ( next week link in online calendars) Summer Term 2010 Web Dynamics 1-7

8 Somesocialnetworks Flickr: (as of Oct2009) 4+ billionphotos(3 billionin Nov 2008, 2 billionin Nov 2007) 3 millionnewphotosper day Facebook: (as of Apr 2010) [ 3+ billionnewphotosper month, 60 millionstatusupdatesper day 400 millionactiveusers(120 millionin Nov 2008, 31 millionin Apr 2007) 150,000 newusersper dayin Nov 2008 (100,000/day in April 2007) Myspace: (as of Apr 2007) 135 million users(6th largest country on Earth) 2+ billion images(150,000 req/s), millions added daily 25 millionsongs 60TB videos StudiVZ.net: (as of Nov 2008) 11 millionusers 300 million images, 1 million added daily Summer Term 2010 Web Dynamics 1-8

9 Somesocialnetworks Flickr: (as of Oct2009) 4+ billionphotos(3 billionin Nov 2008, 2 billionin Nov 2007) 3 millionnewphotosper day Facebook: (as of Apr 2010) [ 3+ billionnewphotosper month, 60 millionstatusupdatesper day 400 millionactiveusers(120 millionin Nov 2008, 31 millionin Apr 2007) 150,000 newusersper dayin Nov 2008 (100,000/day in April 2007) Myspace: (as of Apr 2007) 135 million users(6th largest country on Earth) 2+ billion images(150,000 req/s), millions added daily 25 millionsongs 60TB videos StudiVZ.net: (as of Nov 2008) 11 millionusers 300 million images, 1 million added daily Flickr growth rate , from Summer Term 2010 Web Dynamics 1-9

10 Flickr: MySpace Infrastructure: (as of Oct2009) (as of 2008) Somesocialnetworks sending 100 gigabits of data per second to the Internet 10 gigabits HTML content 90 gigabits media (videos, pictures) 4500 web servers 4+ billionphotos(3 billionin Nov 2008, 2 billionin Nov 2007) 3 millionnewphotosper day Facebook: 1200 cache servers (as of Apr 2010) [ 500 database servers custom distributed file system (from and 3+ billionnewphotosper month, 60 millionstatusupdatesper day 400 millionactiveusers(120 millionin Nov 2008, 31 millionin Apr 2007) 150,000 newusersper dayin Nov 2008 (100,000/day in April 2007) Myspace: (as of Apr 2007) 135 million users(6th largest country on Earth) 2+ billion images(150,000 req/s), millions added daily 25 millionsongs 60TB videos StudiVZ.net: (as of Nov 2008) 11 millionusers 300 million images, 1 million added daily Summer Term 2010 Web Dynamics 1-10

11 Challenges: Sizedynamics How can a search engine deal with infinite Web? Massively parallel, distributed architecture (MapReduce, Hadoop, etc.) Detect and remove noise(duplicates, spam etc.) Summer Term 2010 Web Dynamics 1-11

12 Whichaspectsof theweb aredynamic? Size: pagesaddedand deletedall thetime Content: pageschangeall thetime Summer Term 2010 Web Dynamics 1-12

13 Lifetimeof versionson heise.de High-frequency crawl of heise.de over one week in January 2009 new version when news item added or removed [R. Schenkel, ECIR 2010] Summer Term 2010 Web Dynamics 1-13

14 Evolution of theweb (Ntoulaset al., 2004) Large-scale study: October2002 October2003 Weeklycrawlsof 154 large Web sites(up to 200,000 pages per site) Summer Term 2010 Web Dynamics 1-14

15 Averagepagecreationper week About8% newpagescreatedper week (A. Ntoulas, J. Cho, C. Olston: What snewon theweb? TheEvolution of theweb froma searchengineperspective, WWW Conference, 2004) Summer Term 2010 Web Dynamics 1-15

16 Howlongdo pageslive? About40% of thepagesstill availableafteroneyear (A. Ntoulas, J. Cho, C. Olston: What snewon theweb? TheEvolution of theweb froma searchengineperspective, WWW Conference, 2004) Summer Term 2010 Web Dynamics 1-16

17 Howfrequentlydoesa pagechange? Most pages never change, second most change at least weekly (A. Ntoulas, J. Cho, C. Olston: What snewon theweb? TheEvolution of theweb froma searchengineperspective, WWW Conference, 2004) Summer Term 2010 Web Dynamics 1-17

18 Howmuchdo pageschange? Most of thechangesareminor (A. Ntoulas, J. Cho, C. Olston: What snewon theweb? TheEvolution of theweb froma searchengineperspective, WWW Conference, 2004) Summer Term 2010 Web Dynamics 1-18

19 Howlarge arepages? Averagesizeraisedbyabout15% in oneyear (A. Ntoulas, J. Cho, C. Olston: What snewon theweb? TheEvolution of theweb froma searchengineperspective, WWW Conference, 2004) Summer Term 2010 Web Dynamics 1-19

20 Morerecentnumbers Average size of Web pages more than tripled since 2003 from 93.7K to over 312K Average number of objects per Web page nearly doubled from 25.7 to 49.9 Since 1995 average size of Web pages increased by 22 times Since 1995 average number of objects per Web page increased by 21.7 times (from Summer Term 2010 Web Dynamics 1-20

21 Morerecentcharts (from Summer Term 2010 Web Dynamics 1-21

22 Challenges: Contentdynamics How can a search engine maintain a reasonably accurate snapshot of the Web? Model how/when documents updated Recrawl policy based on expected changes Decide if a page s content changed(enough to replace old version in snapshot) HowcanwemaintaintheWeb of thepast? Web archiving Summer Term 2010 Web Dynamics 1-22

23 Whichaspectsof theweb aredynamic? Size: pagesaddedand deletedall thetime Content: pageschangeall thetime Structure: links added all the time (and dropped) Summer Term 2010 Web Dynamics 1-23

24 Howfrequentlydo links change? 25% newlinks createdper week, 80% of links replacedwithina year (A. Ntoulas, J. Cho, C. Olston: What snewon theweb? TheEvolution of theweb froma searchengineperspective, WWW Conference, 2004) Summer Term 2010 Web Dynamics 1-24

25 Challenges: Structuredynamics How can a search engine maintain a reasonably accurate snapshot of the Web graph? Massively parallel, distributed architecture (MapReduce, Hadoop, etc.) Distributed approximation algorithms for computing authority measures(pagerank) Summer Term 2010 Web Dynamics 1-25

26 Whichaspectsof theweb aredynamic? Size: pagesaddedand deletedall thetime Content: pageschangeall thetime Structure: links added all the time (and dropped) Usage: Behaviourof userschangesall thetime Summer Term 2010 Web Dynamics 1-26

27 Reasonswhyuserbehaviourchanges Global trendsand changes, Web 2.0 (Flickr, Youtube, social networks, twitter, ) Different situation/context Roles(private vs. professional) Locations(home vs. office vs. travelling) Date & Time Tasks(orderinga book, bookinga flight, ) influence browsing and search behaviour Summer Term 2010 Web Dynamics 1-27

28 Challenges: User dynamics Howcana searchengineadaptto changingusers? Identify user(e.g., Google s cookie) Collect user behaviour Personalize search results based on past actions Personalize based on current context Thiscanbedone For eachuser For groupsof users For all users( global user model ) Summer Term 2010 Web Dynamics 1-28

29 Web Dynamics Part 1 - Introduction 1.1 Dimensions of dynamics in the Web 1.2 Application examples Summer Term 2010 Web Dynamics 1-29

30 Live Searchin News Streams Summer Term 2010 Web Dynamics 1-30

31 Searchin PastNews Streams Summer Term 2010 Web Dynamics 1-31

32 Google Trends: Hot Searches Summer Term 2010 Web Dynamics 1-32

33 Google Trends: Searchstats Summer Term 2010 Web Dynamics 1-33

34 Google insights: Trends in searches Summer Term 2010 Web Dynamics 1-34

35 Google Website trends: accessstats Summer Term 2010 Web Dynamics 1-35

36 Google News Timeline: News trends Summer Term 2010 Web Dynamics 1-36

37 Google Web timeline: Date extraction Summer Term 2010 Web Dynamics 1-37

38 Google Zeitgeist: Frequentsearches Summer Term 2010 Web Dynamics 1-38

39 Internet Archive: Waybackmachine Summer Term 2010 Web Dynamics 1-39

40 Internet Archive: Waybackmachine Summer Term 2010 Web Dynamics 1-40

41 MoreWeb Archiving: Iterasi Summer Term 2010 Web Dynamics 1-41

42 References T. Bray: Measuring the Web, WWW Conference, K. Bharat, A. Broder: A technique for measuring the relative size and overlap of public web search engines, WWW Conference, 1998 A. Gulli, A. Signorini: The Indexable Web is more than 11.5 billion pages, WWW Conference, 2005 S. Lawrence and C. L. Giles: Accessibility of information on the web, Nature, 400: , 1999 J. Domenechet al.: A user-focused evaluation of web prefetchingalgorithms,computer Communications 30:10, , 2007 R. Sadre, B. Haverkort: Changes in the Web from 2000 to 2007, Workshop on Distributed Systems: Operations and Management, 2008 K.M. Risvik, R. Michelsen: Searchenginesand Web dynamics, Computer Networks 39, , 2002 Y. Keet al.: Web dynamicsand theirramificationsforthedevelopmentof Web searchengines, Computer Networks 50, , 2006 R. Baeza-Yates et al.: Web structure, dynamics and page quality, SPIRE Conference, 2002 V.N. Padmanabhan, L. Qiu: Thecontentand accessdynamicsof a busyweb site: Findingsand implications, SIGCOMM conference, 2000 L. Cherkasova, M. Karlsson: Dynamics and evolution of Web sites: Analysis, metrics and design issues, IEEE International Symposium on Computers and Communications, 2001 J. Cho, H. Garcia-Molina: Estimatingfrequencyof change, Transactionson Internet Technologies 3(3): , 2003 J. Cho, H. Garcia-Molina: Theevolutionof theweb and implicationsforan incrementalcrawler. VLDB Conference, 2000 A. Ntoulas, J. Cho, C. Olston: What snewon theweb? TheEvolution of theweb froma search engine perspective, WWW Conference, 2004 R. Schenkel: Temporal Shingling for Version Identification in Web Archives, ECIR Conference, Summer Term 2010 Web Dynamics 1-42

Module 1: Internet Basics for Web Development (II)

Module 1: Internet Basics for Web Development (II) INTERNET & WEB APPLICATION DEVELOPMENT SWE 444 Fall Semester 2008-2009 (081) Module 1: Internet Basics for Web Development (II) Dr. El-Sayed El-Alfy Computer Science Department King Fahd University of

More information

A STUDY ON THE EVOLUTION OF THE WEB

A STUDY ON THE EVOLUTION OF THE WEB A STUDY ON THE EVOLUTION OF THE WEB Alexandros Ntoulas, Junghoo Cho, Hyun Kyu Cho 2, Hyeonsung Cho 2, and Young-Jo Cho 2 Summary We seek to gain improved insight into how Web search engines should cope

More information

Women s Circle of Excellence Workshop Shelby Nordhagen April 24, 2009

Women s Circle of Excellence Workshop Shelby Nordhagen April 24, 2009 Women s Circle of Excellence Workshop Shelby Nordhagen April 24, 2009 1991 The World s First Web Site http://info.cern.ch/ CERN? European Nuclear Research Organization Search Engines of the 90 s AltaVista,

More information

7/17/ Learning Objectives and Overview. + Economic Impact. Digital Marketing Mix

7/17/ Learning Objectives and Overview. + Economic Impact. Digital Marketing Mix + Digital Marketing Mix + Learning Objectives and Overview Learning Objectives 1. How does Google look at websites and rank websites accordingly? 2. What can I do get better rankings? 3. How can I use

More information

Design and implementation of an incremental crawler for large scale web. archives

Design and implementation of an incremental crawler for large scale web. archives DEWS2007 B9-5 Web, 247 850 5 53 8505 4 6 E-mail: ttamura@acm.org, kitsure@tkl.iis.u-tokyo.ac.jp Web Web Web Web Web Web Web URL Web Web PC Web Web Design and implementation of an incremental crawler for

More information

Gary Viray Founder, Search Opt Media Inc. Search.Rank.Convert.

Gary Viray Founder, Search Opt Media Inc. Search.Rank.Convert. SEARCH + SOCIAL Gary Viray Founder, Search Opt Media Inc. Goo gol Google Algorithm Change Google Toolbar December 2000 Birth of Toolbar Pagerank They move the toilet mid stream. 404P Pages are ranking

More information

From Internet Data Centers to Data Centers in the Cloud

From Internet Data Centers to Data Centers in the Cloud From Internet Data Centers to Data Centers in the Cloud This case study is a short extract from a keynote address given to the Doctoral Symposium at Middleware 2009 by Lucy Cherkasova of HP Research Labs

More information

Web Crawling. Advanced methods of Information Retrieval. Gerhard Gossen Gerhard Gossen Web Crawling / 57

Web Crawling. Advanced methods of Information Retrieval. Gerhard Gossen Gerhard Gossen Web Crawling / 57 Web Crawling Advanced methods of Information Retrieval Gerhard Gossen 2015-06-04 Gerhard Gossen Web Crawling 2015-06-04 1 / 57 Agenda 1 Web Crawling 2 How to crawl the Web 3 Challenges 4 Architecture of

More information

Learning Temporal-Dependent Ranking Models

Learning Temporal-Dependent Ranking Models Learning Temporal-Dependent Ranking Models Miguel Costa, Francisco Couto, Mário Silva LaSIGE @ Faculty of Sciences, University of Lisbon IST/INESC-ID, University of Lisbon 37th Annual ACM SIGIR Conference,

More information

Introduction April 27 th 2016

Introduction April 27 th 2016 Social Web Mining Summer Term 2016 1 Introduction April 27 th 2016 Dr. Darko Obradovic Insiders Technologies GmbH Kaiserslautern d.obradovic@insiders-technologies.de Outline for Today 1.1 1.2 1.3 1.4 1.5

More information

THE AUSTRALIAN ONLINE LANDSCAPE REVIEW AUGUST 2015

THE AUSTRALIAN ONLINE LANDSCAPE REVIEW AUGUST 2015 THE AUSTRALIAN ONLINE LANDSCAPE REVIEW AUGUST 2015 STATE OF THE ONLINE LANDSCAPE August 2015 Welcome to the August 2015 edition of Nielsen s Online Landscape Review. The online landscape in August saw

More information

Search Quality. Jan Pedersen 10 September 2007

Search Quality. Jan Pedersen 10 September 2007 Search Quality Jan Pedersen 10 September 2007 Outline The Search Landscape A Framework for Quality RCFP Search Engine Architecture Detailed Issues 2 Search Landscape 2007 Source: Search Engine Watch: US

More information

Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule

Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule 1 How big is the Web How big is the Web? In the past, this question

More information

A SURVEY ON WEB FOCUSED INFORMATION EXTRACTION ALGORITHMS

A SURVEY ON WEB FOCUSED INFORMATION EXTRACTION ALGORITHMS INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 A SURVEY ON WEB FOCUSED INFORMATION EXTRACTION ALGORITHMS Satwinder Kaur 1 & Alisha Gupta 2 1 Research Scholar (M.tech

More information

How to Crawl the Web. Hector Garcia-Molina Stanford University. Joint work with Junghoo Cho

How to Crawl the Web. Hector Garcia-Molina Stanford University. Joint work with Junghoo Cho How to Crawl the Web Hector Garcia-Molina Stanford University Joint work with Junghoo Cho Stanford InterLib Technologies Information Overload Service Heterogeneity Interoperability Economic Concerns Information

More information

Accessibility of INGO FAST 1997 ARTVILLE, LLC. 32 Spring 2000 intelligence

Accessibility of INGO FAST 1997 ARTVILLE, LLC. 32 Spring 2000 intelligence Accessibility of INGO FAST 1997 ARTVILLE, LLC 32 Spring 2000 intelligence On the Web Information On the Web Steve Lawrence C. Lee Giles Search engines do not index sites equally, may not index new pages

More information

The Virtual EMS Browse Menu How-To Document

The Virtual EMS Browse Menu How-To Document The Virtual EMS Browse Menu How-To Document Updated August 2016 Table of Contents Browse Menu... 3 Browsing for Events (View Events Listing )... 3 Browsing for Facilities (View Building and room information)...

More information

THE VALUE OF SOCIAL MEDIA

THE VALUE OF SOCIAL MEDIA THE VALUE OF SOCIAL MEDIA DIGITAL LANDSCAPE IN THE PHILIPPINES AS OF MARCH 2016 DIGITAL IN THE PHILIPPINES AS OF MARCH 2016 TOTAL POPULATION 101.5 MILLION *FIGURE REPRESENTS TOTAL NATIONAL POPULATION INCLUDING

More information

Big Data - Some Words BIG DATA 8/31/2017. Introduction

Big Data - Some Words BIG DATA 8/31/2017. Introduction BIG DATA Introduction Big Data - Some Words Connectivity Social Medias Share information Interactivity People Business Data Data mining Text mining Business Intelligence 1 What is Big Data Big Data means

More information

Relevant?!? Algoritmi per IR. Goal of a Search Engine. Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Web Search

Relevant?!? Algoritmi per IR. Goal of a Search Engine. Prof. Paolo Ferragina, Algoritmi per Information Retrieval Web Search Algoritmi per IR Web Search Goal of a Search Engine Retrieve docs that are relevant for the user query Doc: file word or pdf, web page, email, blog, e-book,... Query: paradigm bag of words Relevant?!?

More information

Pushing the Limits. ADSM Symposium Sheelagh Treweek September 1999 Oxford University Computing Services 1

Pushing the Limits. ADSM Symposium Sheelagh Treweek September 1999 Oxford University Computing Services 1 Pushing the Limits ADSM Symposium Sheelagh Treweek sheelagh.treweek@oucs.ox.ac.uk September 1999 Oxford University Computing Services 1 Overview History of ADSM services at Oxford October 1995 - started

More information

How do we capitalize on the unique opportunities Web 2.0 presents? PR in a Web 2.0 World: Getting Found in All the Right Places

How do we capitalize on the unique opportunities Web 2.0 presents? PR in a Web 2.0 World: Getting Found in All the Right Places How do we capitalize on the unique opportunities Web 2.0 presents? PR in a Web 2.0 World: Getting Found in All the Right Places Jamie O Donnell Co-Founder, SEO-PR Web Builder 2.0 According to the Pew Internet

More information

Social Media Tools. March 13, 2010 Presented by: Noble Studios, Inc.

Social Media Tools. March 13, 2010 Presented by: Noble Studios, Inc. March 13, 2010 Presented by: Noble Studios, Inc. 1 Communication Timeline 2 Familiar Social Media Sites According to Facebook, more than 1.5 million local businesses have active pages on Facebook According

More information

OUR TOP DATA SOURCES AND WHY THEY MATTER

OUR TOP DATA SOURCES AND WHY THEY MATTER OUR TOP DATA SOURCES AND WHY THEY MATTER TABLE OF CONTENTS INTRODUCTION 2 MAINSTREAM WEB 3 MAJOR SOCIAL NETWORKS 4 AUDIENCE DATA 5 VIDEO 6 FOREIGN SOCIAL NETWORKS 7 SYNTHESIO DATA COVERAGE 8 1 INTRODUCTION

More information

How Social is Your State Destination Marketing Organization (DMO)?

How Social is Your State Destination Marketing Organization (DMO)? How Social is Your State Destination Marketing Organization (DMO)? Status: This is the 15th effort with the original being published in June of 2009 - to bench- mark the web and social media presence of

More information

Ericsson Mobility Report

Ericsson Mobility Report Ericsson Mobility Report Transforming to a Networked Society JULY 2016 Sean Gowran President and Country Manager Philippines and Pacific Islands Key highlights June 2016 (GLOBAL) THERE ARE NOW 5 BILLION

More information

A global technology leader approaching $42B in sales with 57,000 people, and customers in 160+ countries LENOVO. ALL RIGHTS RESERVED

A global technology leader approaching $42B in sales with 57,000 people, and customers in 160+ countries LENOVO. ALL RIGHTS RESERVED A global technology leader approaching $42B in sales with 57,000 people, and customers in 160+ countries. 2 Lenovo s Performance Lenovo WW PC Market Share 19.7% 2014 13.1% 2013 2012 9.6% 8.2% 2011 6.5%

More information

Characterization of Search Engine Caches

Characterization of Search Engine Caches Characterization of Search Engine Caches Frank McCown, Michael L. Nelson, Old Dominion University; Norfolk, Virginia/USA Abstract Search engines provide cached copies of indexed content so users will have

More information

Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Web Search Basics The Web as a graph

More information

RANK TRACKER User Guide

RANK TRACKER User Guide Rank Tracker User Guide 1 RANK TRACKER User Guide 1 whitespark.ca Rank Tracker User Guide 2 INTRODUCTION TO RANK TRACKER The Rank Tracker is ideal for anyone who wants to: Track rankings for specific keywords

More information

Efficiency at Scale. Sanjeev Kumar Director of Engineering, Facebook

Efficiency at Scale. Sanjeev Kumar Director of Engineering, Facebook Efficiency at Scale Sanjeev Kumar Director of Engineering, Facebook International Workshop on Rack-scale Computing, April 2014 Agenda 1 Overview 2 Datacenter Architecture 3 Case Study: Optimizing BLOB

More information

Web Marketing 101 Your Domain Name Your Website

Web Marketing 101 Your Domain Name Your Website Web Marketing 101 Your Domain Name Your Website Content Responsive Design Stock Images Generating Traffic Google Tools Paid Search Organic Search Social Media What is in a Name? Your BRAND Where to Buy?

More information

Introduction to Data Science Day 2

Introduction to Data Science Day 2 Introduction to Data Science Day 2 Data Matters Summer workshop series in data science Sponsored by the Odum Institute, RENCI, and NCDS Thomas M. Carsey carsey@unc.edu Examples of Data Science Google Flu

More information

Welcome to the Building Rental Room Reservation System. To learn how to use our new system, please choose from one of the following menus:

Welcome to the Building Rental Room Reservation System. To learn how to use our new system, please choose from one of the following menus: Welcome Welcome Welcome to the Building Rental Room Reservation System. To learn how to use our new system, please choose from one of the following menus: Browse Menu Browse Menu The Browse menu contains

More information

Growth Models: The Science of Converting Users into Customers. with Scott Tousley

Growth Models: The Science of Converting Users into Customers. with Scott Tousley Growth Models: The Science of Converting Users into Customers with Scott Tousley Hi! I m Scott Tousley Freemium user acquisition at HubSpot Co-host of The Growth TL;DR Podcast Growth advisor to startups

More information

16 Web Searching: A Quality Measurement Perspective

16 Web Searching: A Quality Measurement Perspective 16 Web Searching: A Quality Measurement Perspective D. Lewandowski and N. Höchstötter Summary The purpose of this paper is to describe various quality measures for search engines and to ask whether these

More information

arxiv:cs/ v2 [cs.dl] 15 Mar 2007

arxiv:cs/ v2 [cs.dl] 15 Mar 2007 Characterization of Search Engine Caches Frank McCown, Michael L. Nelson, Old Dominion University; Norfolk, Virginia/USA arxiv:cs/7383v [cs.dl] Mar 7 Abstract Search engines provide cached copies of indexed

More information

Ranking of ads. Sponsored Search

Ranking of ads. Sponsored Search Sponsored Search Ranking of ads Goto model: Rank according to how much advertiser pays Current model: Balance auction price and relevance Irrelevant ads (few click-throughs) Decrease opportunities for

More information

Measuring KSA Broadband

Measuring KSA Broadband Measuring KSA Broadband Meqyas, Q2 218 Report In 217, the CITC in partnership with SamKnows launched a project to measure internet performance. The project, named Meqyas, gives internet users in Saudi

More information

USER GUIDE DASHBOARD OVERVIEW A STEP BY STEP GUIDE

USER GUIDE DASHBOARD OVERVIEW A STEP BY STEP GUIDE USER GUIDE DASHBOARD OVERVIEW A STEP BY STEP GUIDE DASHBOARD LAYOUT Understanding the layout of your dashboard. This user guide discusses the layout and navigation of the dashboard after the setup process

More information

Assessing the right communications solution for crew internet access. Adonis Violaris Director Marketing & Corporate Communications

Assessing the right communications solution for crew internet access. Adonis Violaris Director Marketing & Corporate Communications Assessing the right communications solution for crew internet access Adonis Violaris Director Marketing & Corporate Communications TECHNOLOGY Once upon a time, we were happy with Fax, Telex, and Voice

More information

THE HISTORY & EVOLUTION OF SEARCH

THE HISTORY & EVOLUTION OF SEARCH THE HISTORY & EVOLUTION OF SEARCH Duration : 1 Hour 30 Minutes Let s talk about The History Of Search Crawling & Indexing Crawlers / Spiders Datacenters Answer Machine Relevancy (200+ Factors)

More information

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since

More information

THE AUSTRALIAN ONLINE LANDSCAPE REVIEW JULY 2015

THE AUSTRALIAN ONLINE LANDSCAPE REVIEW JULY 2015 THE AUSTRALIAN ONLINE LANDSCAPE REVIEW JULY 2015 STATE OF THE ONLINE LANDSCAPE July 2015 Welcome to the July 2015 edition of Nielsen s Online Landscape Review. The online landscape in July saw Australians

More information

Marketing & Back Office Management

Marketing & Back Office Management Marketing & Back Office Management Menu Management Add, Edit, Delete Menu Gallery Management Add, Edit, Delete Images Banner Management Update the banner image/background image in web ordering Online Data

More information

Cheryl Bledsoe, EM Division Manager Clark Regional Emergency Services Agency (CRESA)

Cheryl Bledsoe, EM Division Manager Clark Regional Emergency Services Agency (CRESA) Cheryl Bledsoe, EM Division Manager Clark Regional Emergency Services Agency (CRESA) WHO? Cheryl Bledsoe, Sociologist & Trend Watcher 10 years background in Criminal Justice WHAT? Emergency Manager w/no

More information

Worldwide Internet usage

Worldwide Internet usage Worldwide Internet usage Latin America / Caribbean 11% North America 16% Africa 3% Middle East 3% Oceania / Australia 1% Asia 41% Europe 25% Source: Internet World Stats, March 2009. Worldwide sales of

More information

An Overview of Search Engine. Hai-Yang Xu Dev Lead of Search Technology Center Microsoft Research Asia

An Overview of Search Engine. Hai-Yang Xu Dev Lead of Search Technology Center Microsoft Research Asia An Overview of Search Engine Hai-Yang Xu Dev Lead of Search Technology Center Microsoft Research Asia haixu@microsoft.com July 24, 2007 1 Outline History of Search Engine Difference Between Software and

More information

volley: automated data placement for geo-distributed cloud services

volley: automated data placement for geo-distributed cloud services volley: automated data placement for geo-distributed cloud services sharad agarwal, john dunagan, navendu jain, stefan saroiu, alec wolman, harbinder bhogan very rapid pace of datacenter rollout April

More information

Share of cloud computing activities

Share of cloud computing activities Share of cloud computing activities Survey of 1,533 Internet users, April-May 2008 Percent of internet users 80 60 40 56 34 29 20 0 Source: Pew Research Center. Use webmail Store personal photos online

More information

Internet Applications. Q. What is Internet Explorer? Explain features of Internet Explorer.

Internet Applications. Q. What is Internet Explorer? Explain features of Internet Explorer. Internet Applications Q. What is Internet Explorer? Explain features of Internet Explorer. Internet explorer: Microsoft Internet Explorer is a computer program called a browser that helps you interact

More information

Lecture 11: Graph algorithms! Claudia Hauff (Web Information Systems)!

Lecture 11: Graph algorithms! Claudia Hauff (Web Information Systems)! Lecture 11: Graph algorithms!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the scenes of MapReduce:

More information

COPYRIGHTED MATERIAL. Social Network Programming

COPYRIGHTED MATERIAL. Social Network Programming Social Network Programming The most recent explosive growth on the World Wide Web (WWW) is social networking. Social networking allows you to make and connect to friends in unique and fun ways, in what

More information

AdMob Mobile Metrics Report

AdMob Mobile Metrics Report AdMob Mobile Metrics Report AdMob serves ads for more than 15,000 mobile Web sites and applications around the world. AdMob stores and analyzes the data from every ad request, impression, and click and

More information

Search Engines Considered Harmful In Search of an Unbiased Web Ranking

Search Engines Considered Harmful In Search of an Unbiased Web Ranking Search Engines Considered Harmful In Search of an Unbiased Web Ranking Junghoo John Cho cho@cs.ucla.edu UCLA Search Engines Considered Harmful Junghoo John Cho 1/38 Motivation If you are not indexed by

More information

. social? better than. 7 reasons why you should focus on . to GROW YOUR BUSINESS...

. social? better than. 7 reasons why you should focus on  . to GROW YOUR BUSINESS... Is EMAIL better than social? 7 reasons why you should focus on email to GROW YOUR BUSINESS... 1 EMAIL UPDATES ARE A BETTER USE OF YOUR TIME If you had to choose between sending an email and updating your

More information

Information Retrieval Spring Web retrieval

Information Retrieval Spring Web retrieval Information Retrieval Spring 2016 Web retrieval The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically infinite due to the dynamic

More information

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10 Scalable Web Programming CS193S - Jan Jannink - 2/25/10 Weekly Syllabus 1.Scalability: (Jan.) 2.Agile Practices 3.Ecology/Mashups 4.Browser/Client 7.Analytics 8.Cloud/Map-Reduce 9.Published APIs: (Mar.)*

More information

Automatic Identification of User Goals in Web Search [WWW 05]

Automatic Identification of User Goals in Web Search [WWW 05] Automatic Identification of User Goals in Web Search [WWW 05] UichinLee @ UCLA ZhenyuLiu @ UCLA JunghooCho @ UCLA Presenter: Emiran Curtmola@ UC San Diego CSE 291 4/29/2008 Need to improve the quality

More information

Messenger Wars 2. How Facebook climbed back to #1

Messenger Wars 2. How Facebook climbed back to #1 Messenger Wars 2 How Facebook climbed back to #1 Source: Max Morse for TechCrunch, 2013 https://www.flickr.com/photos/techcrunch/9728625374/in/photolist- Since our hugely popular Messenger Wars: How Facebook

More information

Visualizing Thumbnails Of Archived Web Pages

Visualizing Thumbnails Of Archived Web Pages 1 DEPARTMENT OF COMPUTER SCIENCE MASTER S PROJECT Visualizing Thumbnails Of Archived Web Pages Author: Advisor: Dr. Michele C. Weigle April 24, 2017 1 Acknowledgement I express my gratitude to my project

More information

The Mobile Landscape in France and Europe

The Mobile Landscape in France and Europe The Mobile Landscape in France and Europe E-Marketing Forum - 4 January Blandine Silverman, Director Mobile bsilverman@comscore.com Agenda Mobile Landscape Devices & Platforms From Phones Towards Connected

More information

Communications Workshop Notes

Communications Workshop Notes Webpage Information resource for your LMSC/club/team Answer questions 24/7/365 Your organization s public face to the world Worldwide distribution Scalability Flexibility Use of keywords o $0 2000+ Website

More information

ADARA IMPACT. What a difference data can make

ADARA IMPACT. What a difference data can make ADARA IMPACT What a difference data can make A NEW WAY OF MEASUREMENT OLD WAY Impressions Clicks CTR NEW WAY 11,700 confirmed hotel bookings ADR increase of $20 LOS increase of.3 days Avg. occupancy increase

More information

Search Engines and Web Dynamics

Search Engines and Web Dynamics Search Engines and Web Dynamics Knut Magne Risvik Fast Search & Transfer ASA Knut.Risvik@fast.no Rolf Michelsen Fast Search & Transfer ASA Rolf.Michelsen@fast.no Abstract In this paper we study several

More information

ITP 342 Mobile App Development. APIs

ITP 342 Mobile App Development. APIs ITP 342 Mobile App Development APIs API Application Programming Interface (API) A specification intended to be used as an interface by software components to communicate with each other An API is usually

More information

AdMob Mobile Metrics Report

AdMob Mobile Metrics Report AdMob Mobile Metrics Report AdMob serves ads for more than 7,000 mobile Web sites and 1,600 applications around the world. AdMob stores and analyzes the data from every ad request, impression, and click

More information

Google Analytics: A Worm's-Eye View & DigitalCommons Usage Reports

Google Analytics: A Worm's-Eye View & DigitalCommons Usage Reports University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Library Conference Presentations and Speeches Libraries at University of Nebraska-Lincoln 4-1-2010 Google Analytics: A Worm's-Eye

More information

Campaign Goals, Objectives and Timeline SEO & Pay Per Click Process SEO Case Studies SEO, SEM, Social Media Strategy On Page SEO Off Page SEO

Campaign Goals, Objectives and Timeline SEO & Pay Per Click Process SEO Case Studies SEO, SEM, Social Media Strategy On Page SEO Off Page SEO Campaign Goals, Objectives and Timeline SEO & Pay Per Click Process SEO Case Studies SEO, SEM, Social Media Strategy On Page SEO Off Page SEO Reporting Pricing Plans Why Us & Contact Generate organic search

More information

Measurement and Tracking Awareness June 2011

Measurement and Tracking Awareness June 2011 Measurement and Tracking Awareness June 2011 1 2010-2011 Cooperative Awareness Program Goals and Measurement The below goals were based on a initial budget of $547,572, actual media spend was $418,483.

More information

Online Research Methodology. Dr. David R. Fletcher

Online Research Methodology. Dr. David R. Fletcher Online Research Methodology Dr. David R. Fletcher drf@xpastor.org www.xpastor.org Areas of Discussion Archived Databases DTS Library Databases Search Engines Footnotes & Bibliographies Archived Databases

More information

Beyond Ten Blue Links Seven Challenges

Beyond Ten Blue Links Seven Challenges Beyond Ten Blue Links Seven Challenges Ricardo Baeza-Yates VP of Yahoo! Research for EMEA & LatAm Barcelona, Spain Thanks to Andrei Broder, Yoelle Maarek & Prabhakar Raghavan Agenda Past and Present Wisdom

More information

Slide 1. Opera Max. Migrating the next billion smartphone users for better app experience

Slide 1. Opera Max. Migrating the next billion smartphone users for better app experience Slide 1 Opera Max Migrating the next billion smartphone users for better app experience The 3 Consideration in the Next Billion Migration Slide 2 Cost of ownership (Device) Cost of usage (Data) Network

More information

TempWeb rd Temporal Web Analytics Workshop

TempWeb rd Temporal Web Analytics Workshop TempWeb 2013 3 rd Temporal Web Analytics Workshop Stuff happens continuously: exploring Web contents with temporal information Omar Alonso Microsoft 13 May 2013 Disclaimer The views, opinions, positions,

More information

SOCIAL MEDIA. Charles Murphy

SOCIAL MEDIA. Charles Murphy SOCIAL MEDIA Charles Murphy Social Media Overview 1. Introduction 2. Social Media Areas Blogging Bookmarking Deals Location-based Music Photo sharing Video 3. The Fab Four FaceBook Google+ Linked In Twitter

More information

Understanding Today s Mobile Device Shopper. Google/Compete, U.S. Mar 2011

Understanding Today s Mobile Device Shopper. Google/Compete, U.S. Mar 2011 Understanding Today s Mobile Device Shopper Google/Compete, U.S. Mar 2011 Methodology This study was based on understanding the attitudes of online users who identified themselves as wireless purchasers

More information

Internet Basics. Basic Terms and Concepts. Connecting to the Internet

Internet Basics. Basic Terms and Concepts. Connecting to the Internet Internet Basics In this Learning Unit, we are going to explore the fascinating and ever-changing world of the Internet. The Internet is the largest computer network in the world, connecting more than a

More information

Effective Web Crawlers

Effective Web Crawlers Effective Web Crawlers A thesis submitted for the degree of Doctor of Philosophy Halil Ali B.App.Sc (Hons.), School of Computer Science and Information Technology, Science, Engineering, and Technology

More information

Web Science Your Business, too!

Web Science Your Business, too! Web Science & Technologies University of Koblenz Landau, Germany Web Science Your Business, too! Agenda What is Web Science? An explanation by analogy What do we do about it? Understanding collective effects

More information

Revenue Growth with Evergreen Na2ve Content November 2016, Berlin

Revenue Growth with Evergreen Na2ve Content November 2016, Berlin Revenue Growth with Evergreen Na2ve Content November 2016, Berlin Dominik Grau Chief Innova,on Officer, Ebner Media Group @dominikgrau Dominik Grau Chief Innova,on Officer Ebner Media Group 2011-2015:

More information

Mobile Search: Techniques and Tactics for Marketers

Mobile Search: Techniques and Tactics for Marketers Mobile Search: Techniques and Tactics for Marketers Follow along using #mobileppc Eli Goodman & Mike Solomon *Note: A copy of this presentation will be sent to all attendees within 2-3 business days Our

More information

How To Construct A Keyword Strategy?

How To Construct A Keyword Strategy? Introduction The moment you think about marketing these days the first thing that pops up in your mind is to go online. Why is there a heck about marketing your business online? Why is it so drastically

More information

Bi monthly calendar 2016 printable free

Bi monthly calendar 2016 printable free P ford residence southampton, ny Bi monthly calendar 2016 printable free You won't want to miss this fabulous collection of FREE home organization printables, menu planners, and cleaning schedules. Pinterest

More information

The Next Internet Revolution

The Next Internet Revolution The Next Internet Revolution Panel Detail: Wednesday, May 4, 211 8: AM - 9:15 AM Speakers: Sam Feder, Partner, Jenner & Block LLP John Rogovin, Executive Vice President and General Counsel, Warner Bros.

More information

Search Engines Considered Harmful In Search of an Unbiased Web Ranking

Search Engines Considered Harmful In Search of an Unbiased Web Ranking Search Engines Considered Harmful In Search of an Unbiased Web Ranking Junghoo John Cho cho@cs.ucla.edu UCLA Search Engines Considered Harmful Junghoo John Cho 1/45 World-Wide Web 10 years ago With Web

More information

Traveler s Path to Purchase

Traveler s Path to Purchase Traveler s Path to Purchase DEREK PRICE Director, North America Expedia Media Solutions Previous experience: More than 20 years experience in the travel industry holding roles in everything from Leisure

More information

ONLINE MARKETING INTELLIGENCE. Insights on how to navigate in the ever changing world of Google

ONLINE MARKETING INTELLIGENCE. Insights on how to navigate in the ever changing world of Google ONLINE MARKETING INTELLIGENCE Insights on how to navigate in the ever changing world of Google GROWING MARKET Search media constitutes an increasing part of the online marketing budget for most companies

More information

PRELIDA. D2.3 Deployment of the online infrastructure

PRELIDA. D2.3 Deployment of the online infrastructure Project no. 600663 PRELIDA Preserving Linked Data ICT-2011.4.3: Digital Preservation D2.3 Deployment of the online infrastructure Start Date of Project: 01 January 2013 Duration: 24 Months UNIVERSITAET

More information

Effective Page Refresh Policies for Web Crawlers

Effective Page Refresh Policies for Web Crawlers For CS561 Web Data Management Spring 2013 University of Crete Effective Page Refresh Policies for Web Crawlers and a Semantic Web Document Ranking Model Roger-Alekos Berkley IMSE 2012/2014 Paper 1: Main

More information

The Web: Concepts and Technology. January 15: Course Overview

The Web: Concepts and Technology. January 15: Course Overview The Web: Concepts and Technology January 15: Course Overview 1 Today s Plan Who am I? What is this course about? Logistics Who are you? 2 Meet Your Instructor Instructor: Eugene Agichtein Web: http://www.mathcs.emory.edu/~eugene

More information

Making the Most of the Full Range of Digital & Integrated Fundraising Channels

Making the Most of the Full Range of Digital & Integrated Fundraising Channels Making the Most of the Full Range of Digital & Integrated Fundraising Channels Grace Ho, Baptist Oi Kwan Social Service Speaker s Bio Jason Potts Director of THINK Consulting Solutions (UK) Heather Tallent

More information

Microsoft Exam

Microsoft Exam Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred

More information

Prototyping Data Intensive Apps: TrendingTopics.org

Prototyping Data Intensive Apps: TrendingTopics.org Prototyping Data Intensive Apps: TrendingTopics.org Pete Skomoroch Research Scientist at LinkedIn Consultant at Data Wrangling @peteskomoroch 09/29/09 1 Talk Outline TrendingTopics Overview Wikipedia Page

More information

You ve Got Mail! List Offer Creative. Timely insights & trends. Katie Parker Editorial Director. Zach Christensen Creative Director

You ve Got Mail! List Offer Creative. Timely  insights & trends. Katie Parker Editorial Director. Zach Christensen Creative Director You ve Got Mail! Timely email insights & trends Colleen Webster Digital Solutions Director Katie Parker Editorial Director Zach Christensen Director List Pop Quiz! #1 Direct marketing rule: your list better

More information

OpenINTEL an infrastructure for long-term, large-scale and high-performance active DNS measurements. Design and Analysis of Communication Systems

OpenINTEL an infrastructure for long-term, large-scale and high-performance active DNS measurements. Design and Analysis of Communication Systems OpenINTEL an infrastructure for long-term, large-scale and high-performance active DNS measurements DACS Design and Analysis of Communication Systems Why measure DNS? (Almost) every networked service relies

More information

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype? Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/

More information

A Review Paper on Big data & Hadoop

A Review Paper on Big data & Hadoop A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College

More information

Information Networks. Hacettepe University Department of Information Management DOK 422: Information Networks

Information Networks. Hacettepe University Department of Information Management DOK 422: Information Networks Information Networks Hacettepe University Department of Information Management DOK 422: Information Networks Search engines Some Slides taken from: Ray Larson Search engines Web Crawling Web Search Engines

More information

MBB Robot Crawler Data Report in 2014H1

MBB Robot Crawler Data Report in 2014H1 MBB Robot Crawler Data Report in 2014H1 Contents Contents 1 Introduction... 1 2 Characteristics and Trends of Web Services... 3 2.1 Increasing Size of Web Pages... 3 2.2 Increasing Average Number of Access

More information

Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Overview Overview Introduction Classic

More information