Webscraping at Statistics Netherlands
|
|
- Nicholas Hart
- 5 years ago
- Views:
Transcription
1 Webscraping at Statistics Netherlands Olav ten Bosch 23 March 2016, ESSnet big data WP2, Rome
2 Content Internet as a datasource (IAD): motivation Some IAD projects over past years Technologies used Summary / trends Observations / thoughts Legal The Dutch Business Register 2
3 The why Administrative sources Tax, social security services Municipalities/ Provinces Supermarkets Internet sources Surveys 3
4 Fuel prices (2009) Daily fuel prices from website of unmanned petrol stations (tinq.nl) Regional prices (per station) every day Now: 2016: A direct data feed from travelcard company, weekly Fuel prices per day and all transactions of that week Publication in website: prices per month 4
5 Airline tickets (2010) Pilot: 3 robots on 6 airline companies 2 robots by external companies, 1 by SN Prices comply with manual collection Quite expensive; negative business case 2016: still manual price collection of airline tickets 250 Ticket price Amsterdam - Milano Robot 100 Manual Feb 03 Mar 23 Mar 12 Apr 02 May 22 May 11 Jun 01 Jul 21 Jul 10 Aug 5
6 Housing market Housing market (from 2011): Discussions with external company for > 1 year (iwoz) We scraped 5 sites, about observations / week, 2 years >: Direct feed from one of the sites (Jaap.nl) Statline tables: Bestaande woningen in verkoop based on percent of the market 7
7 Bulk price collection for CPI (1) Bulk price collection for CPI (from 2012): Mainly clothing Software scrapes all prices and product data (id, name, description, category, colour, size, ) 2016: About price observations daily from 10 sites Data from 3 sites used in production of Dutch CPI Price collection process embedded in organisation Plans to extend to > 20 sites; other domains 8
8 Bulk price collection for CPI (2) Data collection & Feature extraction Features: Fine-knit Jumper Dark blue Striped Cotton edges Structured data Big Data Index methods Index based on internet data 9 Processing bulk data from the Internet
9 Robot-assisted price collection Robot tool for detecting price changes on (parts of) websites Traffic light indicates status: Green: nothing changed, prices is saved in database Red: some change, need attention of statistician Two click to hold old price or store a new one In production from 2014
10 Collect data on enterprises for EGR (2013) Pilot: find data about EGR enterprises on the web We scraped semi structured data from Wikipedia Multiple wikipedia languages (NL, EN, DE, FR) 2016: something alike in ESSnet BD WP2? 11
11 Search product descriptions for classifying business activities Search product descriptions on web (from 2014) First time we used automated search with Google search API for statistics Pilot, no production Some doubts on google results 12
12 Twitter-LinkedIn (1) LinkedIn-Twitter for profiling (2015) Automated search on LinkedIn based on a sample of twitter users Very specific and experimental Profiling of Twitter data, a big data selectivity study, Piet Daas, Joep Burger, Quan Lé, Olav ten Bosch 13
13 14
14 Scraping websites of enterprises Identify family businesses (search and / or crawling) (2016) Identify businesses with a Corporate Social Responsibility (CSR) (search and / or crawling) (2016) Research program: Extracting information from websites to improve economic figures This ESSnet BD WP2!!! 15
15 Crawling for Statistics Url-base Incomplete statistical data Search terms Internet Focused Crawler (Roboto) Navigation terms Item identifyer terms year report, family business Data store Search & Match ElasticSearch More complete statistical data 16
16 Technologies used Perl (2009), Djuggler (2010) Python, Scrapy (2010) R ( ) NodeJS (Javacript on server) (2014-) Google Search API (2014-) ElasticSearch (2016) Roboto (nodejs package, ) Nutch: tested, not used Generic Framework (robot framework) for bulk scraping of prices 17
17 Summary / trends Production Scrape Search Crawl External company Tinq x (x) Travelcard Airlines x 2 robots Housing x (x) Jaap.nl BulkCPI x x Robottool x x (x) EGR x x RGS Twitter/ Linkedin x Enterprises x x Dataprovider? x x 18
18 Observations / thoughts If it is there, we can get it Technology is (usually) not the problem! The internet is a living thing! It s too simple to think we can just buy the internet somewhere and then make statistics! It s powerful to combine something we know with something we observe! External companies can help, but be careful 19
19 20
20 Legal Dutch Statistics Law: Enterprises have to provide data to Statistics Netherlands on request Scraping information from websites reduces response burden Statistics Netherlands does use data for official statistics only Dutch database legislation: Commercial re-use of intellectual property is forbidden This may also apply to internet sources Privacy: Dutch (statistical) legislation on protection of personal information Statistics Netherlands does only scrape public sources and processes data within Statistics Netherlands safe environment, just as with other (privacy-sensitive) data internally Netiquette: respect robots.txt identify yourself (user-agent) do not overload servers, use some idle time between requests 21
21 Dutch Business Register (simplified) - From administrative units to statistical units: Legal units relationships Cluster of control Enterprise groups Enterprises Local units Sources: - Trade Register - Tax Register - Social security register (employees) - Profilers - About 1.5 Million administrative entities - About 0.5 Million have a url - Quality of url field not known, but seems usable 22
Uses of web scraping for official statistics
Uses of web scraping for official statistics ESTP course on Big Data Sources Web, Social Media and Text Analytics, Day 1 Olav ten Bosch, Statistics Netherlands THE CONTRACTOR IS ACTING UNDER A FRAMEWORK
More informationAn introduction to web scraping, IT and Legal aspects
An introduction to web scraping, IT and Legal aspects ESTP course on Automated collection of online proces: sources, tools and methodological aspects Olav ten Bosch, Statistics Netherlands THE CONTRACTOR
More informationESSnet BD SGA2. WP2: Web Scraping Enterprises, NL plans Gdansk meeting. Olav ten Bosch, Dick Windmeijer, Oct 4th 2017
ESSnet BD SGA2 WP2: Web Scraping Enterprises, NL plans Gdansk meeting Olav ten Bosch, Dick Windmeijer, Oct 4th 2017 Contents SGA1 results SGA2 plans 2 Legal (3) August 14 th 2017: - The world is still
More informationWeb scraping meets survey design: combining forces
Web scraping meets survey design: combining forces Olav ten Bosch, Dick Windmeijer, Arnout van Delden and Guido van den Heuvel Statistics Netherlands, The Hague, The Netherlands Contact: o.tenbosch@cbs.nl
More informationReview of UK Big Data EssNet WP2 SGA1 work. WP2 face-to-face meeting, 4/10/17
Review of UK Big Data EssNet WP2 SGA1 work WP2 face-to-face meeting, 4/10/17 Outline Ethical/legal issues Website identification Using registry information Using scraped data E-commerce Job vacancy Outstanding
More informationESSnet Big Data WP2: Webscraping Enterprise Characteristics
ESSnet Big Data WP2: Webscraping Enterprise Characteristics Methodological note The ESSnet BD WP2 performs joint web scraping experiments following in multiple countries, using as much as possible the
More informationUsing Internet as a Data Source for Official Statistics: a Comparative Analysis of Web Scraping Technologies
Using Internet as a Data Source for Official Statistics: a Comparative Analysis of Web Scraping Technologies Giulio Barcaroli 1 (barcarol@istat.it), Monica Scannapieco 1 (scannapi@istat.it), Donato Summa
More information2
March 2016 1 2 3 4 5 A. Consumer Confidence Index (CCI) - Consumer Confidence Index (CCI) - Current Economic Condition Index (CECI) - Consumer Expectation Index (CEI) Current Economic Condition Index -
More informationEUROSTAT and BIG DATA. High Level Seminar on integrating non traditional data sources in the National Statistical Systems
EUROSTAT and BIG DATA High Level Seminar on integrating non traditional data sources in the National Statistical Systems Santiago, Chile, October 1-2, 2018 1 Table of contents About Eurostat What is the
More informationWeb scraping and social media scraping introduction
Web scraping and social media scraping introduction Jacek Lewkowicz, Dorota Celińska University of Warsaw February 23, 2018 Motivation Definition of scraping Tons of (potentially useful) information on
More informationWeb Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson
Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Web Search Basics The Web as a graph
More information2
May 2016 1 2 3 4 5 A. Consumer Confidence Index (CCI) - Consumer Confidence Index (CCI) - Current Economic Condition Index (CECI) - Consumer Expectation Index (CEI) Current Economic Condition Index - Current
More information2
February 2015 1 2 3 4 5 A. Consumer Confidence Index (CCI) - Consumer Confidence Index (CCI) - Current Economic Condition Index (CECI) - Consumer Expectation Index (CEI) Current Economic Condition Index
More informationIstat s Pilot Use Case 1
Istat s Pilot Use Case 1 Pilot identification 1 IT 1 Reference Use case X 1) URL Inventory of enterprises 2) E-commerce from enterprises websites 3) Job advertisements on enterprises websites 4) Social
More informationOdata 4 in Excel 2016
Odata 4 in Excel 2016 Introduction Statistics Netherlands publishes most of its figures in tables. These tables can be accessed through StatLine at the CBS website. In order to make these figures more
More informationINDIA DIGITAL STATSHOT KEY STATISTICAL INDICATORS FOR INTERNET, MOBILE, AND SOCIAL MEDIA USAGE IN INDIA IN AUGUST 2015 SIMON KEMP WE ARE SOCIAL
we are social DIGITAL STATSHOT INDIA KEY STATISTICAL INDICATORS FOR INTERNET, MOBILE, AND SOCIAL MEDIA USAGE IN INDIA IN AUGUST 2015 SIMON KEMP WE ARE SOCIAL We Are Social We Are Social 2015 DIGITAL IN
More informationEconomic and Housing Market Trends and Outlook
Economic and Housing Market Trends and Outlook Lawrence Yun, Ph.D. Chief Economist NATIONAL ASSOCIATION OF REALTORS Presentation to Memphis Area Association of REALTORS Memphis, TN April 2, 213 Forecast
More informationCRM Insights. User s Guide
CRM Insights User s Guide Copyright This document is provided "as-is". Information and views expressed in this document, including URL and other Internet Web site references, may change without notice.
More informationUnderstanding and Exploiting Texting Technology. Mike Whaley & Kelly Van Hill PHFE WIC Program Local Agency California WIC Program
Understanding and Exploiting Texting Technology Mike Whaley & Kelly Van Hill PHFE WIC Program Local Agency California WIC Program Our Overall Enrollment in Texting At some clinics over 90% on texting We
More informationSeptember Real Sector Statistics Division. Methodology
September 2014 Methodology The Consumer Survey has been conducted monthly since October 1999. Moreover, since January 2007 the Survey has involved 4,600 households as respondents (stratified random sampling)
More informationHow to Crawl the Web. Hector Garcia-Molina Stanford University. Joint work with Junghoo Cho
How to Crawl the Web Hector Garcia-Molina Stanford University Joint work with Junghoo Cho Stanford InterLib Technologies Information Overload Service Heterogeneity Interoperability Economic Concerns Information
More informationWebsite Acecore Technologies JL B.V.:
Privacy policy Acecore Technologies JL B.V. B.V. Acecore Technologies JL B.V. (hereafter: Acecore technologies JL B.V.) focusses on the development and shipping of drones in the creative, industrial and
More informationSEO Search Engine Optimisation An Intro to SEO. Prepared By: Paul Dobinson Tel
SEO Search Engine Optimisation An Intro to SEO Prepared By: Paul Dobinson Tel 441-278-1004 paul@bermudayp.com Points - What is SEO - differences SEO vs SEM - Search Engines - Crawlers /Accessibility -
More informationOctober Real Sector Statistics Division. Methodology
October 2013 Methodology The Consumer Survey has been conducted monthly since October 1999. Moreover, since January 2007 the Survey has involved 4,600 households as respondents (stratified random sampling)
More informationTHE SECRET FOR MAKING MONEY ONLINE
SALESFIST.COM THE SECRET FOR MAKING MONEY ONLINE What The Internet Marketers Aren t Telling You! SalesFist.Com Introduction My name is Jeremiah and I have been in sales for the past 15 years. I am not
More informationON THE USE OF INTERNET AS A DATA SOURCE FOR OFFICIAL STATISTICS: A STRATEGY FOR IDENTIFYING ENTERPRISES ON THE WEB 1
Rivista Italiana di Economia Demografia e Statistica Volume LXX n.4 Ottobre-Dicembre 2016 ON THE USE OF INTERNET AS A DATA SOURCE FOR OFFICIAL STATISTICS: A STRATEGY FOR IDENTIFYING ENTERPRISES ON THE
More informationReview of Ezgif.com. Generated on Introduction. Table of Contents. Iconography
Review of Ezgif.com Generated on 2016-12-11 Introduction This report provides a review of the key factors that influence SEO and the usability of your website. The homepage rank is a grade on a 100-point
More informationCORPORATE PRESENTATION 2014
CORPORATE PRESENTATION 2014 ABOUT DIGITAL GAMING Digital gaming is everywhere Smartphones & Tablets Feature phones TVs & STBs Smartphones, tablets, smart TVs, STBs, smart watches Installed base of 2 billion
More informationWhen a single graph isn t enough
When a single graph isn t enough FRANK SMIT Chief Innovation Officer Co-founder and CEO The number one tool for social media monitoring, webcare, publishing & social analytics Founded in 2011 Located in
More informationNew Data: Sources, Governance, Infrastructure, Analysis. Tim Holt UK Data Forum
New Data: Sources, Governance, Infrastructure, Analysis Tim Holt UK Data Forum What do me mean by new forms of data? What the UK has done so far? What do we mean by infrastructure? Access Going Forward:
More informationUrban Land Institute A Bend in the Road
Urban Land Institute A Bend in the Road Brian Beaulieu CEO 216 Forecast Results 2 Duration Accuracy US GDP 15 99.1% US Ind. Prod. 1 99.9% Eur Ind. Prod. 15 99.2% Canada Ind Prod 15 99.6% China Ind Prod
More information(3.62 Pages/Visit) * Not viewed traffic includes traffic generated by robots, worms, or replies with special HTTP status codes.
Last Update: 31 Aug - 19:00 Reported period: Aug 6 6 OK Reported period Month Aug First visit NA Last visit 31 Aug - 18:59 Viewed traffic * Not viewed traffic * Summary Unique visitors Number of visits
More informationTREND: Would you describe the state of the nation's economy these days as excellent, good, not so good, or poor? (* High also 62%)
TREND: Would you describe the state of the nation's economy these days as excellent, good, not so good, or poor? (* High also 62%) Not so Exclnt Good Good Poor DK/NA Nov 15, 2017 8 54 27 10 2 High Exclnt+Good
More informationSearch Engines. Charles Severance
Search Engines Charles Severance Google Architecture Web Crawling Index Building Searching http://infolab.stanford.edu/~backrub/google.html Google Search Google I/O '08 Keynote by Marissa Mayer Usablity
More informationMarketing & Back Office Management
Marketing & Back Office Management Menu Management Add, Edit, Delete Menu Gallery Management Add, Edit, Delete Images Banner Management Update the banner image/background image in web ordering Online Data
More information18050 (2.48 pages/visit) Jul Sep May Jun Aug Number of visits
30-12- 0:45 Last Update: 29 Dec - 03:05 Reported period: OK Summary Reported period Month Dec First visit 01 Dec - 00:07 Last visit 28 Dec - 23:59 Unique visitors Number of visits Pages Hits Bandwidth
More informationNational State Auditors Association Vulnerability Management: An Audit Primer September 20, 2018
Office of the Legislative Auditor State of Minnesota National State Auditors Association Vulnerability Management: An Audit Primer September 20, 2018 Christopher Buse Deputy Legislative Auditor Boot Camp
More informationThis report is based on sampled data. Jun 1 Jul 6 Aug 10 Sep 14 Oct 19 Nov 23 Dec 28 Feb 1 Mar 8 Apr 12 May 17 Ju
0 - Total Traffic Content View Query This report is based on sampled data. Jun 1, 2009 - Jun 25, 2010 Comparing to: Site 300 Unique Pageviews 300 150 150 0 0 Jun 1 Jul 6 Aug 10 Sep 14 Oct 19 Nov 23 Dec
More informationPrivacy Policy Identity Games
Document Name: Privacy Policy Reference: GDPR 1.0 This privacy policy was last modified on 26 July, 2018. Privacy Policy Identity Games In this policy, "we", "us" and "our" refer to Identity Games International
More informationEvaluation and checking nonresponse data by soft computing approaches - case of business and trade statistics
Evaluation and checking nonresponse data by soft computing approaches - case of business and trade statistics Miroslav Hudec, Jana Juriová INFOSTAT Institute of Informatics and Statistics Brussels, 7.
More informationKnowledge Platform TOKEN SALE. GUIDELINE (MetaMask & MyEtherWallet)
Knowledge Platform TOKEN SALE GUIDELINE (MetaMask & MyEtherWallet) Table of Contents Token Sale Summary and Introduction 2 Token Sale Contribution Prerequisites 4 How to Purchase GIL Using MetaMask 8 How
More informationPlease let us know if you have any questions regarding this Policy either by to or by telephone
Our Privacy Policy At Torbay Fishing we are committed to protecting and preserving the privacy of our customers when visiting us, visiting our website or communicating (electronically or verbally) with
More informationContractors Guide to Search Engine Optimization
Contractors Guide to Search Engine Optimization CONTENTS What is Search Engine Optimization (SEO)? Why Do Businesses Need SEO (If They Want To Generate Business Online)? Which Search Engines Should You
More informationWhy do more AmLaw 100 firms use gwabbit than all other ERM/DQMs combined?
gwabbit Compare Why do more AmLaw 100 firms use gwabbit than all other ERM/DQMs combined? It s hard to do what we do for a living. Don t take our word for it. Check our competitors references. Check ours
More informationSmart Protection Network. Raimund Genes, CTO
Smart Protection Network Raimund Genes, CTO Overwhelmed by Volume of New Threats New unique samples added to AV-Test's malware repository (2000-2010) 20.000.000 18.000.000 16.000.000 14.000.000 12.000.000
More informationSME License Order Working Group Update - Webinar #3 Call in number:
SME License Order Working Group Update - Webinar #3 Call in number: Canada Local: +1-416-915-8942 Canada Toll Free: +1-855-244-8680 Event Number: 662 298 966 Attendee ID: check your WebEx session under
More informationSouth Hams Motor Club Our Privacy Policy. How do we collect information from you? What type of information is collected from you?
South Hams Motor Club Our Privacy Policy At South Hams Motor Club (SHMC) we are committed to protecting and preserving the privacy of our customers when attending our events, visiting our website or communicating
More informationReview of Kilwinningrangers.com
Review of Kilwinningrangers.com Generated on 2018-06-18 Introduction This report provides a review of the key factors that influence the SEO and usability of your website. The homepage rank is a grade
More informationDATA COLLECTION. Slides by WESLEY WILLETT 13 FEB 2014
DATA COLLECTION Slides by WESLEY WILLETT INFO VISUAL 340 ANALYTICS D 13 FEB 2014 WHERE DOES DATA COME FROM? We tend to think of data as a thing in a database somewhere WHY DO YOU NEED DATA? (HINT: Usually,
More informationStrong signs your website needs a professional redesign
Strong signs your website needs a professional redesign Think - when was the last time that your business website was updated? Better yet, when was the last time you looked at your website? When the Internet
More informationExtracting data from the web
Extracting data from the web Donato Summa THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION 1 Summary IaD & IaD methods Web Scraping tools ICT usage in enterprises URL retrieval
More informationOnline User Data Removal. Team Sports: Cassidy Burden, Leela Karunakaran, Robert Ly, Ross McNulty, Sneha Shrotriya
Online User Data Removal Team Sports: Cassidy Burden, Leela Karunakaran, Robert Ly, Ross McNulty, Sneha Shrotriya Research Plan Market Research - Robert o User Behavior Legality and Ethics - Sneha o Standards,
More informationWhat is NovelTorpedo?
NovelTorpedo What is NovelTorpedo? A website designed to index online literature. Enables users to read all of their favorite fanfiction in one place. Who will use NovelTorpedo? Avid readers of fanfiction
More informationOptimizing Field Operations. Jeff Shaner
Optimizing Field Operations Jeff Shaner Field GIS Taking GIS Beyond the Office Collecting Data Reporting Observations Managing Work Connecting the Field with the Enterprise Workforce for ArcGIS Field workforce
More informationPersonal data processed by one of these entities are also processed on behalf of the other entities within the ID&T network.
PRIVACY STATEMENT ID&T ID&T (also referred to as: we, our and us ) takes your privacy very seriously and treats all your personal information with great care. This document sets out the types of data that
More informationGetting in Gear with the Service Catalog
Getting in Gear with the Service Catalog How an actionable Service Catalog can drive Service Management adoption, improve Customer Satisfaction and provide a Return On Investment......And do it quickly
More informationFritztile is a brand of The Stonhard Group THE STONHARD GROUP Privacy Notice The Stonhard Group" Notice Whose Personal Data do we collect?
Fritztile is a brand of The Stonhard Group THE STONHARD GROUP Privacy Notice For the purposes of applicable data protection and privacy laws, The Stonhard Group, a division of Stoncor Group, Inc. ( The
More informationMonthly SEO Report. Example Client 16 November 2012 Scott Lawson. Date. Prepared by
Date Monthly SEO Report Prepared by Example Client 16 November 212 Scott Lawson Contents Thanks for using TrackPal s automated SEO and Analytics reporting template. Below is a brief explanation of the
More informationThe Vision Council Winds of Change
The Vision Council Winds of Change Brian Beaulieu CEO Preliminary 217 Forecast Results If you heard ITR a year ago 2 Duration Accuracy US GDP (data through Sep) 24 98.5% US Ind. Prod. (Dec) 24 96.8% Eur
More informationUSING DATA MODEL PATTERNS FOR RAPID APPLICATIONS DEVELOPMENT
USING DATA MODEL PATTERNS FOR RAPID APPLICATIONS DEVELOPMENT David C. Hay Essential Strategies, Inc. :+$7$5('$7$02'(/3$77(516" The book Data Model Patterns: Conventions Thought 1 presents a set standard
More informationStatistics for cornish-maine.org ( )... 4/25/15, 12:07 PM
Last Update: 25 Apr - 12:04 Update now Reported period: Mar OK Reported period Month Mar First visit 01 Mar - 00:24 Last visit 31 Mar - 23:35 Summary Unique visitors Number of visits Pages Hits Bandwidth
More informationsoftware.sci.utah.edu (Select Visitors)
software.sci.utah.edu (Select Visitors) Web Log Analysis Yearly Report 2002 Report Range: 02/01/2002 00:00:0-12/31/2002 23:59:59 www.webtrends.com Table of Contents Top Visitors...3 Top Visitors Over Time...5
More informationRanking of ads. Sponsored Search
Sponsored Search Ranking of ads Goto model: Rank according to how much advertiser pays Current model: Balance auction price and relevance Irrelevant ads (few click-throughs) Decrease opportunities for
More informationBEST PRACTICE GUIDE ON THE USE OF THE ELECTRONIC COMMON TECHNICAL DOCUMENT
CMD(h) BEST PRACTICE GUIDE ON THE USE OF THE ELECTRONIC COMMON TECHNICAL DOCUMENT (ectd) IN THE MUTUAL RECOGNITION AND DECENTRALISED PROCEDURES April 2008 in the MRP/DCP April 2008 Page 1/29 TABLE OF CONTENTS
More informationWeb scraping tools, a real life application
Web scraping tools, a real life application ESTP course on Automated collection of online proces: sources, tools and methodological aspects Guido van den Heuvel, Dick Windmeijer, Olav ten Bosch, Statistics
More informationI Travel on mobile / FR
I Travel on mobile / FR Exploring how people use their smartphones for travel activities Q3 2016 I About this study Background: Objective: Mobile apps and sites are a vital channel for advertisers to engage
More informationGOOGLE VAULT AND SPANNING BACKUP
Understanding the difference between GOOGLE VAULT AND SPANNING BACKUP SPANNING BACKUP VS. Many people concerned about data loss in G Suite wonder if Google Vault is the solution to their problems. It s
More informationDigital Marketing Proposal
Digital Marketing Proposal ---------------------------------------------------------------------------------------------------------------------------------------------- 1 P a g e We at Tronic Solutions
More informationweb engineering introduction
web engineering introduction team prof. moira norrie matthias geel linda di geronimo alfonso murolo www.globis.ethz.ch/education 20.02.2014 norrie@inf.ethz.ch 2 what is web engineering? technologies, tools
More informationHigh-Performance Distributed DBMS for Analytics
1 High-Performance Distributed DBMS for Analytics 2 About me Developer, hardware engineering background Head of Analytic Products Department in Yandex jkee@yandex-team.ru 3 About Yandex One of the largest
More informationLecture 4: Data Collection and Munging
Lecture 4: Data Collection and Munging Instructor: Outline 1 Data Collection and Scraping 2 Web Scraping basics In-Class Quizzes URL: http://m.socrative.com/ Room Name: 4f2bb99e Data Collection What you
More informationOur Privacy Policy gives you detailed information on when and why we collect your personal information, how we use it and how we keep it secure.
Junction.co.uk Privacy Policy www.junction.co.uk is owned and operated by Cambridge Junction. We are committed to safeguarding your privacy online and to this end have developed the following Privacy Policy
More informationSAS Scalable Performance Data Server 4.3
Scalability Solution for SAS Dynamic Cluster Tables A SAS White Paper Table of Contents Introduction...1 Cluster Tables... 1 Dynamic Cluster Table Loading Benefits... 2 Commands for Creating and Undoing
More information2017 RIMS CYBER SURVEY
2017 RIMS CYBER SURVEY This report marks the third year that RIMS has surveyed its membership about cyber risks and transfer practices. This is, of course, a topic that only continues to captivate the
More information95.2% Website review of yoast.com/ Executive Summary
Website review of yoast.com/ Created on 21-08-2018 at 19:12h 95.2% Executive Summary This report analyzes the factors that affect the SEO and usability of yoast.com. The factors are grouped into 6 categories,
More informationCONTENT CALENDAR USER GUIDE SOCIAL MEDIA TABLE OF CONTENTS. Introduction pg. 3
TABLE OF CONTENTS SOCIAL MEDIA Introduction pg. 3 CONTENT 1 Chapter 1: What Is Historical Optimization? pg. 4 2 CALENDAR Chapter 2: Why Historical Optimization Is More Important Now Than Ever Before pg.
More informationNo domain left behind
No domain left behind is Let s Encrypt democratizing encryption? M Aertsen 1, M Korzyński 2, G Moura 3 1 National Cyber Security Centre The Netherlands 2 Delft University of Technology The Netherlands
More informationCreating a Classifier for a Focused Web Crawler
Creating a Classifier for a Focused Web Crawler Nathan Moeller December 16, 2015 1 Abstract With the increasing size of the web, it can be hard to find high quality content with traditional search engines.
More informationSEO JUNE, NEWSLETTER 2012
SEO NEWSLETTER JUNE, 2012 I 01 PENGUIN 1.1 UPDATE RELEASED + RECOVERING FROM THE PENGUIN UPDATE N D 02 GOOGLE PLACES INTEGRATED INTO GOOGLE+, NOW CALLED GOOGLE+ LOCAL E 03 GOOGLE WEBMASTER TOOLS REVAMPED
More informationTraveler s Path to Purchase
Traveler s Path to Purchase DEREK PRICE Director, North America Expedia Media Solutions Previous experience: More than 20 years experience in the travel industry holding roles in everything from Leisure
More information2010 ANNUAL REPORT 1
ANNUAL REPORT 1 TABLE OF CONTENTS 1. overview: - Intake and investigations Page 3 2. Top ten complaints investigated: - Underlying causes, regional distribution and Canadian Postal Service Charter impacts
More informationInternet Governance in January January 2018
Internet Governance in January 2018 30 January 2018 A look back: Main events in January 17-19 Jan. MAPPING Surveillance Event (Rome) 17-19 Jan. ITU Expert Group on the International Telecommunication Regulations
More informationTemporal characterization of the requests to Wikipedia
Temporal characterization of the requests to Wikipedia Antonio J. Reinoso, Jesus M. Gonzalez-Barahona, Rocio Muñoz-Mansilla and Israel Herraiz Abstract This paper presents an empirical study about the
More informationPlease be aware that not every step is necessary for your website, or it may be outside the scope of our agreed-upon deliverables.
Every client has unique needs for their SEO project. Each SEO project is slightly different. If you have ever been curious as to what we do when we review a client s website, then this checklist provides
More informationSales Drop, Inventories Fell, Prices Climb
1 Economics.. -. - Sales (Millions of Units) -1. Sep-1 Dec-1 Mar-1 Jun-1 Sep-1 Dec-1 Mar-1 Jun-1 Sep-1.. Millions of Units Consensus *: Actual: * from Bloomberg. Record High:.2 mln (/) Percent Change from
More informationStatistics for cornish-maine.org ( ) - main
Statistics for cornish-maine.org (-02) - main Last Update: 05 Mar - 06:12 Reported period: Feb OK Summary Reported period Month Feb First visit NA Last visit 28 Feb - 20:23 Unique visitors Number of visits
More informationPrivacy Policy. Applicable to TrueSec Inspect AB, organisation number
Privacy Policy Applicable to TrueSec Inspect AB, organisation number 559148-3788 Contents Policy for customer privacy and marketing 2 1. General 2 2. Data Controller 2 3. When do we collect and process
More informationNetwork Rationalization Update
TITLE OF PRESENTATION (24PT. ARIAL, BOLD, ALL UPPERCASE) Network Rationalization Update October 15, 2014 Subtitle (20pt. Arial, Bold, Title Case) Dave Williams FOOTER (10PT. ARIAL, BOLD, GREY, CAPS) 1
More informationOnline music is constantly changing. Static artist bios and old images from the archives no longer deliver an engaging experience.
Online music is constantly changing. Static artist bios and old images from the archives no longer deliver an engaging experience. Music fans want a real-time, socially-aware window into the world of music.
More informationScheduling and E mailing Reports
Scheduling and E mailing Reports Reports can be scheduled to run at specific times, run periodically, and increment date parameters. Report outputs can be e mailed to specific EAS users so that they can
More informationOct 2014 #Data up to P
FMCG MONITOR An integrated view of Indonesia FMCG market Oct 2014 #Data up to P10 2014 1 executive SUMMARY 1 Consumer Price Index (CPI) in October 2014 inflated 0.47% from September 2014. Current YTD inflation
More informationStatistical Methods in Trending. Ron Spivey RETIRED Associate Director Global Complaints Tending Alcon Laboratories
Statistical Methods in Trending Ron Spivey RETIRED Associate Director Global Complaints Tending Alcon Laboratories What s In It For You? Basic Statistics in Complaint Trending Basic Complaint Trending
More informationSCI - NIH/NCRR Site. Web Log Analysis Yearly Report Report Range: 01/01/ :00:00-12/31/ :59:59.
SCI - NIH/NCRR Site Web Log Analysis Yearly Report 2003 Report Range: 01/01/2003 00:00:00-12/31/2003 23:59:59 www.webtrends.com Table of Contents General Statistics...5 Page Views Over Time...8 Top Pages
More informationPrevious Intranet Initial intranet created in 2002 Created solely by Information Systems Very utilitarian i Created to permit people to access forms r
ACHIEVA Cafe Steve McDonell Previous Intranet Initial intranet created in 2002 Created solely by Information Systems Very utilitarian i Created to permit people to access forms remotely Not much content
More informationSocial Marketing User Guide
Social Marketing User Guide 1 Table of Contents 3 4 8 10 11 12 13 Settings Creating a New Post My Posts Customer Posts Leads Content Analytics 2 Settings The first time you log in, navigate to the left
More informationUsability evaluation in practice: the OHIM Case Study
Usability evaluation in practice: the OHIM Case David García Dorvau, Nikos Sourmelakis coppersony@hotmail.com, nikos.sourmelakis@gmail.com External consultants at the Office for Harmonization in the Internal
More informationTechnical and Financial Proposal SEO Proposal
Technical and Financial Proposal SEO Proposal Prepared by: Fahim Khan Operations Manager Zaman IT Phone: +88 09612776677 Cell: +88 01973 009007 Email: Skype: masud007rana House # 63, Road # 13, Sector
More informationAbout Mark Bullock & Company Chartered Surveyors
Privacy Policy Updated 28th November, 2018 By continuing to use this site you a) agree to us providing to you the information you have requested and b) confirm that you have read and agree to the use of
More informationVitheia IoT Services
DaaS & IoT Transformation: Going from a vendor locked system centric approach to an open user controlled citizen centric approach By Eugen Rotariu and Hans Aanesen EPR-forum (www.tgov.no) Vitheia AS(www.vitheia.com)
More informationBISHOP GROSSETESTE UNIVERSITY. Document Administration. This policy applies to staff, students, and relevant data subjects
BISHOP GROSSETESTE UNIVERSITY Document Administration Document Title: Document Category: Privacy Policy Policy Version Number: 1.0 Status: Reason for development: Scope: Author / developer: Owner Approved
More information