Becoming a Web Archivist: My 10 Year Journey in the National Library of Estonia

Similar documents
Strategy for long term preservation of material collected for the Netarchive by the Royal Library and the State and University Library 2014

1. Name of Your Organization. 2. About Your Organization. Page 1. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj.

NDSA Web Archiving Survey

CAT (CURATOR ARCHIVING TOOL): IMPROVING ACCESS TO WEB ARCHIVES

Community Tools and Best Practices for Harvesting and Preserving At-Risk Web Content ACA 2013

Archiving and Preserving the Web. Kristine Hanna Internet Archive November 2006

Preservation of Web Materials

WEB ARCHIVE COLLECTING POLICY

NATION WIDE WEBS. Jefferson Bailey, Director, Web Archiving & Data Services, Internet Archive IIPC WAC NLNZ 2018

Archiving the Web: What can Institutions learn from National and International Web Archiving Initiatives

CDL s Web Archiving System

User Manual Version August 2011

Digital The Harold B. Lee Library

Oh My, How the Archive has Grown: Library of Congress Challenges and Strategies for Managing Selective Harvesting on a Domain Crawl Scale

User Manual. Version September 2013

Demos: DMP Assistant and Dataverse

Appendix B SUBSCRIPTION TERMS AND CONDITIONS. Addendum to British Columbia Electronic Library Network Electronic Products License Agreement

Web Archiving at UTL

Meeting researchers needs in mining web archives: the experience of the National Library of France

Web-Archiving: Collecting and Preserving Important Web-based National Resources

2008 DOT GOV HARVEST PRESERVING ACCESS

Preserving Public Government Information: The 2008 End of Term Crawl Project

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

Harvesting Democracy: Archiving Federal Government Web Content at End of Term

Research Data Edinburgh: MANTRA & Edinburgh DataShare. Stuart Macdonald EDINA & Data Library University of Edinburgh

Data Curation Handbook Steps

Introduction... 2 New Features... 3

Navigating the Universe of ETDs: Streamlining for an Efficient and Sustainable Workflow at the University of North Florida Library

Legal Deposit of Online Newspapers at the BnF - Clément Oury - IFLA PAC Paris 2012

Deposit instructions for social and behavioural sciences

August 14th - 18th 2005, Oslo, Norway. Web crawling : The Bibliothèque nationale de France experience

The Metadata Assignment and Search Tool Project. Anne R. Diekema Center for Natural Language Processing April 18, 2008, Olin Library 106G

Veritas Backup Exec Migration Assistant

Néonaute: mining web archives for linguistic analysis

Caring for research data and what about software? Peter Doorn, director DANS

Ensuring Proper Storage for Earth Science Data: The USGS Process to Certify Trusted Digital Repositories

Safe Havens in a Choppy Sea:

Pam Armstrong Library and Archives Canada Ottawa, Canada

The DART-Europe E-theses Portal

Inside Discovery The Journey to Summon 2.0

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

Applying Archival Science to Digital Curation: Advocacy for the Archivist s Role in Implementing and Managing Trusted Digital Repositories

Preserving Legal Blogs

Digitisation of historic newspapers and voluntary digital deposit of newspaper pre-print files in the the National Library of Estonia

Digital Preservation at NARA

DataFlow and VIDaaS Workshop

MANAGE YOUR CONSTRUCTION21 COMMUNITY

Web Archiving Workshop

Protecting Future Access Now Models for Preserving Locally Created Content

Centre of Registers and Information Systems. Kätlin Kattai Head of International Relations

AS/NZS ISO 13008:2014

7.3. In t r o d u c t i o n to m e t a d a t a

Building a Web Curator Tool for The National Library of New Zealand

Northeast Historic Film Summer Symposium

Powering Official Statistics at Statistics New Zealand with DDI-L and Colectica

Using PBworks in the Classroom/Library. What is a wiki? Wiki means quick or fast in Hawaiian. How to use a wiki for your classroom/library

Science Europe Consultation on Research Data Management

LC Genre/Form Headings and What They Mean for I-Share Libraries. Prepared by the I-Share Cataloging and Authority Control Team (ICAT)

2017 NACHA Third-Party Sender Initiatives

From The European Library to The European Digital Library. Jill Cousins Inforum, Prague, May 2007

UC Irvine LAUC-I and Library Staff Research

The e-depot in practice. Barbara Sierman Digital Preservation Officer Madrid,

Copernicus Space Component. Technical Collaborative Arrangement. between ESA. and. Enterprise Estonia

Workshop B: Archiving the Web

Web archiving: practices and options for academic libraries

Electronic Records Archives: Philadelphia Federal Executive Board

Records Management and Interoperability Initiatives and Experiences in the EU and Estonia

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

Research Data Management: Edinburgh University Library s Approach Dominic Tate

Issues and Opportunities Associated with Federated Searching

Data Management Checklist

Using the Vita Group Citrix Portal

Feed the Future Innovation Lab for Peanut (Peanut Innovation Lab) Data Management Plan Version:

The Point of View of the Publisher

Building Illinois Electronic Documents Access

International Implementation of Digital Library Software/Platforms 2009 ASIS&T Annual Meeting Vancouver, November 2009

National Library 2

Heiðrun. Building DPLA s New Metadata Ingestion System. Mark A. Matienzo Digital Public Library of America

I data set della ricerca ed il progetto EUDAT

CREATING DIGITAL REPOSITORIES PRESENTED BY CHAMA MPUNDU MFULA CHIEF LIBRARIAN NATIONAL ASSEMBLY OF ZAMBIA

Update on the TDL Metadata Working Group s activities for

Instructions for taking Supplier Ethical Expectations Training on ILN

State Government Digital Preservation Profiles

ATI Web Accessibility Report AY 09/10

POSITION DESCRIPTION

INSPIRE & Environment Data in the EU

Edward M. Corrado and Sandy Card: ELUNA 2011

The type of organization for which you created the collection and the potential user and their needs.

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Semantic Interoperability of Basic Data in the Italian Public Sector Giorgia Lodi

How to contribute information to AGRIS

Crossing the Archival Borders

How to Register where a Brownfield Land Register has been published.

Accession Procedures Born-Digital Materials Workflow

Instructions for taking Supplier Ethics & Expectations Training aka Supplier Ethical Expectations

Putting the Discovery in WorldCat Discovery Daniel Jolley

From Open Data to Data- Intensive Science through CERIF

Making Open Data work for Europe

picturemaxx my-picturemaxx THE #1 MARKETPLACE FOR MEDIA BUYERS CONNECT DISCOVER CREATE

Transcription:

Becoming a Web Archivist: My 10 Year Journey in the National Library of Estonia Tiiu Daniel National Library of Estonia IIPC Web Archiving Conference, New Zealand, Wellington November 13, 2018

You can't connect the dots looking forward; you can only connect them looking backwards (Steve Jobs) 2002 first contact with web preservation topic Bachelor s thesis Library s Electronic Catalog as Intermediator of Internet Resources (Information Sciences, Tallinn University) 2008 senior bibliographer of web resources; Collection Development Department, National Library of Estonia 2009 preparations to relaunch the web archiving program

Legal aspects of web archiving 2006 2016 Legal Deposit Act allowed to collect web publications and make them publicly accessible since 2017 - new coplitely revised Legal Deposit Copy Act More specific criteria Restricted access, except if permissions from copyright holders were gained Open access to government websites by default (according to Public Information Act)

Selection principles Former policy overly library oriented 2010 - expert group of people from different memory and research institutions to cover wider range of interests 2011 - new selection policy 2010-2015 only selective and event/thematic harvesting 2016 first bulk harvesting of Estonian web content

Aquisition (methods and tools) 2010-2015 Selective, event/topical Netarcive Suite Heritrix version 1.14 Multiple seeds in one job (topically divided) QA manual browsing Since 2016 Selective, event/topical, bulk in-house built curation tool Heritrix version 3.3 One seed per job (with few exceptions) QA manual browsing, Heritrix reports analysis Screenshots of websites homepages (since 2016) Deduplication (since 2017)

Curator tool Krool

Content description and discovery Thematic catalog of archived websites Search by URL or words in metadata fields (title, description, seed/related URL) Not standarized yet implementation of Dublin Core and OCLC s Descriptive Metadata for Web Archiving planned in 2019 Wayback Machine s old version in public portal Python WayBack for internal use (QA) Full-text search in coming years

Preservation Archive size 28,7 TB Bit preservation 2 copies: library s disc space + magnetic tape outside the library Long-term preservation solution on the analysis stage with as-is process map freshly done, final results are expected in March 2019

Formation of the web archive team Profile 2009-2012 2013 2014 2015 2017- Web curator 1 full-time 1 full-time + 1 part-time 3 full-time 2 full-time 2 full-time Technical staff 1 part-time 1 part-time 1 part-time 1 full-time 2 full-time

IIPC our supporting pillar Member since 2012 Most trusted source of expertise Specific community for a very specific task almost like a secret sect Different communication channels, webinars, annual meetings etc. Cooperation through collaborative collections Not too active in working groups, more keeping eye on them

Concluding remarks Comparing to 2008, landscape has changed remarkably Web archiving is widely recognized activity among memory institutions Most national libraries (at least in Europe) have launched their nations web preservation programs Plenty of (professional) information and events organized Bigger media coverage Web is like a jungle constant struggle with obstacles in capturing the content (robot blocking, limited technical tools, authenticity etc)

Concluding remarks Things I love about my work: uniqueness very versatile in nature encourages creativity challenging enthusiastic and devoted colleagues great international community...etc.

Thank You! Tiiu Daniel tiiu.daniel@nlib.ee http://veebiarhiiv.digar.ee