CDL s Web Archiving System

Size: px
Start display at page:

Download "CDL s Web Archiving System"

Transcription

1 CDL s Web Archiving System Erik Hetzner UC3, California Digital Library 16 June 2011 Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

2 Introduction We don t decide what to collect. We don t decide when to collect it. We build tools to allow curators to make those decisions. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

3 Introduction Vital statistics 35 public archives 16 partners 2724 web sites 289,272,095 URLs ( 2) 16.1 TB ( 2) Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

4 Introduction Vital statistics 35 public archives 16 partners 2724 web sites 289,272,095 URLs ( 2) 16.1 TB ( 2) Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

5 Introduction Vital statistics 35 public archives 16 partners 2724 web sites 289,272,095 URLs ( 2) 16.1 TB ( 2) Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

6 Introduction Vital statistics 35 public archives 16 partners 2724 web sites 289,272,095 URLs ( 2) 16.1 TB ( 2) Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

7 Introduction Vital statistics 35 public archives 16 partners 2724 web sites 289,272,095 URLs ( 2) 16.1 TB ( 2) Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

8 Introduction The Web Archiving Service Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

9 Introduction Archive Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

10 Introduction Search Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

11 Introduction Site list Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

12 Introduction Archived page Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

13 How we do it Collection focus (unofficial) Middle East political sites (Stanford) Social movements (Tamiment, NYU) California government sites (UC) Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

14 How we do it Tools Heritrix 1.14.x Open-source Wayback Nutchwax (moving to Solr) CDL s legacy Digital Preservation Repository... and a lot of UI code... ARC management, indexing scripts, etc. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

15 Difficulties Web archiving is easy*, but there are some difficulties. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

16 Difficulties Uneven coverage We only crawl what our curators select. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

17 Difficulties Human selection High precision; low recall. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

18 Difficulties Scale We are not Internet Archive scale: but we are big enough that it takes a long time to do anything. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

19 Difficulties Collection mismatch Our crawls are organized into collections. Everybody [?] else has one big archive. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

20 Difficulties Politics We are customer-driven: we need to convince customers that collaboration is good for them. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

21 Possibilities What s on our plate Deduplication... requires a new index (Solr) Moving to our new Merritt repository Implementing Memento Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

22 Possibilities What s on our plate Deduplication... requires a new index (Solr) Moving to our new Merritt repository Implementing Memento Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

23 Possibilities What s on our plate Deduplication... requires a new index (Solr) Moving to our new Merritt repository Implementing Memento Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

24 Possibilities What s on our plate Deduplication... requires a new index (Solr) Moving to our new Merritt repository Implementing Memento Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

25 Possibilities Evaluating community needs What do we have that you need? What do you have that we need? Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

26 Possibilities Collaboration with researchers The hard, fun problems are not necessarily the ones that we need to be solved. But maybe we can work it out. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

27 Possibilities Temporal search How can we rank (and display) results across time? Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

28 Possibilities Standards Standards for sharing, or providing computational access to, metadata or full content. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

29 Possibilities The changing web Flash and HTML5 throw a monkeywrench in the web. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

30 Possibilities Cross-archive collections There is no reason why our curators should only be using our crawls. How can we build collections that span archives? Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

31 Possibilities CDL s Web Archiving Service We build tools; curators build collections. We are ready to be part of a global web archive infrastructure. What next? Thanks for having me, and thanks for listening. erik.hetzner@ucop.edu Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24

Archiving and Preserving the Web. Kristine Hanna Internet Archive November 2006

Archiving and Preserving the Web. Kristine Hanna Internet Archive November 2006 Archiving and Preserving the Web Kristine Hanna Internet Archive November 2006 1 About Internet Archive Non profit founded in 1996 by Brewster Kahle, as an Internet library Provide universal and permanent

More information

Capturing our State: The California State Government Web Archive

Capturing our State: The California State Government Web Archive Capturing our State: The California State An Infopeople Hosted Webinar April 24, 2017 This work is licensed under the CreaBve Commons AMribuBon-NonCommercial 4.0 InternaBonal License. To view a copy of

More information

The Web-at-Risk at Three: Overview of an NDIIPP Web Archiving Initiative

The Web-at-Risk at Three: Overview of an NDIIPP Web Archiving Initiative The Web-at-Risk at Three: Overview of an NDIIPP Web Archiving Initiative Tracy Seneca Abstract The Web-at-Risk project is a multi-year National Digital Information Infrastructure and Preservation Program

More information

A Micro-Services-Based Approach for Curation and Preservation Solutions

A Micro-Services-Based Approach for Curation and Preservation Solutions Global Oracle PASIG User Group Meeting, Redwood Shores, May 10-12, 2011 A Micro-Services-Based Approach for Curation and Preservation Solutions Stephen Abrams Patricia Cruse John Kunze Perry Willett University

More information

Leveraging High Performance Computing Infrastructure for Trusted Digital Preservation

Leveraging High Performance Computing Infrastructure for Trusted Digital Preservation Leveraging High Performance Computing Infrastructure for Trusted Digital Preservation 12 December 2007 Digital Curation Conference Washington D.C. Richard Moore Director of Production Systems San Diego

More information

Systems Interoperability and Collaborative Development for Web Archiving

Systems Interoperability and Collaborative Development for Web Archiving Systems Interoperability and Collaborative Development for Web Archiving Filling Gaps in the IMLS National Digital Platform Mark Phillips, University of North Texas Courtney Mumma, Internet Archive Talk

More information

Web Archiving Workshop

Web Archiving Workshop Web Archiving Workshop Mark Phillips Texas Conference on Digital Libraries June 4, 2008 Agenda 1:00 Welcome/Introductions 1:15 Introduction to Web Archiving History Concepts/Terms Examples 2:15 Collection

More information

Preserving Legal Blogs

Preserving Legal Blogs Preserving Legal Blogs Georgetown Law School Linda Frueh Internet Archive July 25, 2009 1 Contents 1. Intro to the Internet Archive All media The Web Archive 2. Where do blogs fit? 3. How are blogs collected?

More information

2008 DOT GOV HARVEST PRESERVING ACCESS

2008 DOT GOV HARVEST PRESERVING ACCESS 2008 DOT GOV HARVEST PRESERVING ACCESS Cathy N. Hartman Mark E. Phillips FDLC Oct 21, 2008 UNIVERSITY OF NORTH TEXAS LIBRARIES Outline Project History Tool Building Partner Activities Future Work Project

More information

Metadata Sharing Policy

Metadata Sharing Policy University of California Libraries Metadata Sharing Policy Lisa Rowlison de Ortiz UC Berkeley, June 2015 Why is the Policy important? Establishes a major area for collaboration Who would make use of shared

More information

Becoming a Web Archivist: My 10 Year Journey in the National Library of Estonia

Becoming a Web Archivist: My 10 Year Journey in the National Library of Estonia Becoming a Web Archivist: My 10 Year Journey in the National Library of Estonia Tiiu Daniel National Library of Estonia IIPC Web Archiving Conference, New Zealand, Wellington November 13, 2018 You can't

More information

Archiving the Web: What can Institutions learn from National and International Web Archiving Initiatives

Archiving the Web: What can Institutions learn from National and International Web Archiving Initiatives Archiving the Web: What can Institutions learn from National and International Web Archiving Initiatives Maureen Pennock Michael Day Lizzie Richmond UKOLN University of Bath UKOLN University of Bath University

More information

UC Irvine LAUC-I and Library Staff Research

UC Irvine LAUC-I and Library Staff Research UC Irvine LAUC-I and Library Staff Research Title Research Data Management: Local UCI Outreach to Faculty Permalink https://escholarship.org/uc/item/18f3v1j7 Author Tsang, Daniel C Publication Date 2013-02-25

More information

Web Archiving at UTL

Web Archiving at UTL Web Archiving at UTL iskills workshops February 2018 Sam-chin Li Reference and Government Information Librarian, UTL Nich Worby Government Information and Statistics Librarian, UTL Agenda What is web archiving

More information

Preserving Public Government Information: The 2008 End of Term Crawl Project

Preserving Public Government Information: The 2008 End of Term Crawl Project Preserving Public Government Information: The 2008 End of Term Crawl Project Abbie Grotke, Library of Congress Mark Phillips, University of North Texas Libraries George Barnum, U.S. Government Printing

More information

Digital Repository Development at Yale University Library

Digital Repository Development at Yale University Library Digital Repository Development at Yale University Library Michael Dula CTO, Yale University Library December 8, 2014 Library Information Technology at Yale Four groups comprise Library IT: Workstation

More information

NDSA Web Archiving Survey

NDSA Web Archiving Survey NDSA Web Archiving Survey Introduction In 2011 and 2013, the National Digital Stewardship Alliance (NDSA) conducted surveys of U.S. organizations currently or prospectively engaged in web archiving to

More information

From Web Page Storage to Living Web Archives Thomas Risse

From Web Page Storage to Living Web Archives Thomas Risse From Web Page Storage to Living Web Archives Thomas Risse JISC, the DPC and the UK Web Archiving Consortium Workshop British Library, London, 21.7.2009 1 Agenda Web Crawlingtoday& Open Issues LiWA Living

More information

Harvesting Democracy: Archiving Federal Government Web Content at End of Term

Harvesting Democracy: Archiving Federal Government Web Content at End of Term Harvesting Democracy: Archiving Federal Government Web Content at End of Term Jefferson Bailey, Director, Web Archiving, Internet Archive @jefferson_bail jefferson@archive.org Abbie Grotke, Web Archiving

More information

On the Change in Archivability of Websites Over Time

On the Change in Archivability of Websites Over Time Old Dominion University ODU Digital Commons Computer Science Presentations Computer Science 9-23-2013 On the Change in Archivability of Websites Over Time Mat Kelly Old Dominion University Justin F. Brunelle

More information

Web archiving: practices and options for academic libraries

Web archiving: practices and options for academic libraries Web archiving: practices and options for academic libraries Helen Hockx-Yu, Director of Global Web Services, Internet Archive @hhockx helen@archive.org Outline Web archiving basics Web archiving landscape

More information

Strategy for long term preservation of material collected for the Netarchive by the Royal Library and the State and University Library 2014

Strategy for long term preservation of material collected for the Netarchive by the Royal Library and the State and University Library 2014 Strategy for long term preservation of material collected for the Netarchive by the Royal Library and the State and University Library 2014 Introduction This document presents a strategy for long term

More information

1. Name of Your Organization. 2. About Your Organization. Page 1. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj.

1. Name of Your Organization. 2. About Your Organization. Page 1. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. In Fall 2011, the National Digital Stewardship Alliance (NDSA) conducted a survey of U.S. organizations currently or prospectively engaged in web archiving to better understand the landscape: similarities

More information

From Web Page Storage to Living Web Archives

From Web Page Storage to Living Web Archives From Web Page Storage to Living Web Archives Thomas Risse JISC, the DPC and the UK Web Archiving Consortium Workshop British Library, London, 21.7.2009 1 Agenda Web Crawling today & Open Issues LiWA Living

More information

Wayback for Accessing Web Archives

Wayback for Accessing Web Archives Wayback for Accessing Web Archives ABSTRACT 'Wayback' is an open-source, Java software package for browserbased access of archived web material, offering a variety of operation modes and opportunities

More information

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members Agenda Background information Services Common Data Infrastructure

More information

Utilizing Digital Library Infrastructure to Build Modern Research Collections

Utilizing Digital Library Infrastructure to Build Modern Research Collections Utilizing Digital Library Infrastructure to Build Modern Research Collections Mark Phillips Wisconsin Association of Academic Librarians What is a digital library? 2 After 15+ years surely we've figured

More information

WEB ARCHIVE COLLECTING POLICY

WEB ARCHIVE COLLECTING POLICY WEB ARCHIVE COLLECTING POLICY Purdue University Libraries Virginia Kelly Karnes Archives and Special Collections Research Center 504 West State Street West Lafayette, Indiana 47907-2058 (765) 494-2839

More information

Contrary to popular belief, elephants do not provide good relevancy tests. Nor do cats.

Contrary to popular belief, elephants do not provide good relevancy tests. Nor do cats. Contrary to popular belief, elephants do not provide good relevancy tests. Nor do cats. h"p://daisythecurlycat.blogspot.com/2009/03/elephant mancat.html Practical Relevancy Testing Naomi Dushay Stanford

More information

Web-Archiving: Collecting and Preserving Important Web-based National Resources

Web-Archiving: Collecting and Preserving Important Web-based National Resources Web-Archiving: Collecting and Preserving Important Web-based National Resources Mark Phillips Dr. Daniel Gelaw Alemneh University of North Texas UNT Libraries The Web is the platform for communication

More information

The Smithsonian/NASA Astrophysics Data System

The Smithsonian/NASA Astrophysics Data System The Smithsonian/NASA Astrophysics Data System Status Report Alberto Accomazzi Michael Kurtz Harvard-Smithsonian Center for Astrophysics http://ads.harvard.edu The ADS Project Established in 1989 (before

More information

Automatic Processing.

Automatic Processing. Automatic Processing fernando.melo@fccn.pt Tutorial outline Research the Past Web using Web archives 1. Search and access The Past Web: examples and use cases Public online services 2. Publish and preserve

More information

Information Retrieval Spring Web retrieval

Information Retrieval Spring Web retrieval Information Retrieval Spring 2016 Web retrieval The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically infinite due to the dynamic

More information

CAT (CURATOR ARCHIVING TOOL): IMPROVING ACCESS TO WEB ARCHIVES

CAT (CURATOR ARCHIVING TOOL): IMPROVING ACCESS TO WEB ARCHIVES CAT (CURATOR ARCHIVING TOOL): IMPROVING ACCESS TO WEB ARCHIVES Ciro Llueca, Daniel Cócera Biblioteca de Catalunya (BC) Barcelona, Spain padicat@bnc.cat Natalia Torres, Gerard Suades, Ricard de la Vega

More information

Now let s take a look

Now let s take a look 1 2 3 4 Manage assets across the end to end life cycle of your studies This includes forms, datasets, terminologies, files, links and more, for example: - Studies may contain the protocol, a set of Forms,

More information

Introduction to Islandora Kim Pham, Digital Projects & Technologies Librarian (UTSC) Kelli Babcock, Digital Initiatives Librarian (UTL)

Introduction to Islandora Kim Pham, Digital Projects & Technologies Librarian (UTSC) Kelli Babcock, Digital Initiatives Librarian (UTL) Introduction to Islandora 2018.02.08 Kim Pham, Digital Projects & Technologies Librarian (UTSC) Kelli Babcock, Digital Initiatives Librarian (UTL) First! Login to your computer. Open Chrome. Go to https://goo.gl/rvhrz8

More information

Medici for Digital Cultural Heritage Libraries. George Tsouloupas, PhD The LinkSCEEM Project

Medici for Digital Cultural Heritage Libraries. George Tsouloupas, PhD The LinkSCEEM Project Medici for Digital Cultural Heritage Libraries George Tsouloupas, PhD The LinkSCEEM Project Overview of Digital Libraries A Digital Library: "An informal definition of a digital library is a managed collection

More information

Preservation of Web Materials

Preservation of Web Materials Preservation of Web Materials Julie Dietrich INFO 560 Literature Review 7/20/13 1 Introduction Websites are a communication and informational tool that can be shared and updated across the World Wide Web.

More information

Néonaute: mining web archives for linguistic analysis

Néonaute: mining web archives for linguistic analysis Néonaute: mining web archives for linguistic analysis Sara Aubry, Bibliothèque nationale de France Emmanuel Cartier, LIPN, University of Paris 13 Peter Stirling, Bibliothèque nationale de France IIPC Web

More information

Science-as-a-Service

Science-as-a-Service Science-as-a-Service The iplant Foundation Rion Dooley Edwin Skidmore Dan Stanzione Steve Terry Matthew Vaughn Outline Why, why, why! When duct tape isn t enough Building an API for the web Core services

More information

C4: Library, Archives, Museum, and more: connecting for improved heritage services in the City of Burnaby

C4: Library, Archives, Museum, and more: connecting for improved heritage services in the City of Burnaby C4: Library, Archives, Museum, and more: connecting for improved heritage services in the City of Burnaby Kathy Bryce, Partner, Andornot Consulting Inc. and Arilea Sill, IAP Administrator, Government of

More information

Drupal for Virtual Learning And Higher Education

Drupal for Virtual Learning And Higher Education Drupal for Virtual Learning And Higher Education Next generation virtual learning Most Virtual Learning solutions include at least the following: - a repository of learning objects: various resources used

More information

DSpace and Web Material: Inroads and Challenges. Leslie Myrick, NYU DLF Spring Forum April 15, 2005

DSpace and Web Material: Inroads and Challenges. Leslie Myrick, NYU DLF Spring Forum April 15, 2005 DSpace and Web Material: Inroads and Challenges Leslie Myrick, NYU DLF Spring Forum April 15, 2005 What I ll Be Covering NDIIPP Web at Risk Project Web Archive Data Object Modeling DSpace and HTML Issues,

More information

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT EUDAT A European Collaborative Data Infrastructure Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT OpenAire Interoperability Workshop Braga, Feb. 8, 2013 EUDAT Key facts

More information

SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES

SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES Introduction to Information Retrieval CS 150 Donald J. Patterson This content based on the paper located here: http://dx.doi.org/10.1007/s10618-008-0118-x

More information

Community Tools and Best Practices for Harvesting and Preserving At-Risk Web Content ACA 2013

Community Tools and Best Practices for Harvesting and Preserving At-Risk Web Content ACA 2013 Community Tools and Best Practices for Harvesting and Preserving At-Risk Web Content ACA 2013 Scott Reed, Internet Archive Amanda Wakaruk, University of Alberta Libraries Kelly E. Lau, University of Alberta

More information

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC SAP Agile Data Preparation Simplify the Way You Shape Data Introduction SAP Agile Data Preparation Overview Video SAP Agile Data Preparation is a self-service data preparation application providing data

More information

Do-It-Yourself Data Migration

Do-It-Yourself Data Migration Do-It-Yourself Data Migration Dr. Paul Dorsey & Michael Rosenblum Dulcian, Inc. June 5, 2012 1 of 29 Been Around FOREVER Who Am I? - Paul Spoke at almost every big Oracle conference since 93 First inductee

More information

Islandora and Fedora 4; The Atonement v3: The Atonermenter

Islandora and Fedora 4; The Atonement v3: The Atonermenter Islandora and Fedora 4; The Atonement v3: The Atonermenter Project history and background Fedora 4 Interest Group Thank you to our sponsors: Atonement One Repo to rule them all, One Repo to find them,

More information

How to Register at the AMC Member Center and Select Electronic Delivery

How to Register at the AMC Member Center and Select Electronic Delivery How to Register at the AMC Member Center and Select Electronic Delivery By Jeff Carden Registering with the AMC s Member Center at http://www.outdoors.org/membership/member-center.cfm provides you with

More information

Supporting C2 Research and Evaluation: An Infrastructure and its Potential Impact

Supporting C2 Research and Evaluation: An Infrastructure and its Potential Impact Supporting C2 Research and Evaluation: An Infrastructure and its Potential Impact James Law, Ph.D. and Marion Ceruti, Ph.D. Space and Naval Warfare Systems Center Pacific (SSC Pacific) 16th ICCRTS, Quebec

More information

Linked Data and Libraries

Linked Data and Libraries Linked Data and Libraries American Library Association Grassroots Program: From Legacy Data to Linked Data Preparing Libraries for Web 3.0 Chicago, IL July 13, 2009 Eric Miller em@zepheira.com Copyright

More information

Digging into File Formats: Poking around at data using file, DROID, JHOVE, and more

Digging into File Formats: Poking around at data using file, DROID, JHOVE, and more Digging into File Formats: Poking around at data using file, DROID, JHOVE, and more Presented by Stephen Eisenhauer UNT Libraries TechTalks February 12, 2014 Why? We handle a lot of digital information

More information

Its All About The Metadata

Its All About The Metadata Best Practices Exchange 2013 Its All About The Metadata Mark Evans - Digital Archiving Practice Manager 11/13/2013 Agenda Why Metadata is important Metadata landscape A flexible approach Case study - KDLA

More information

Data Discovery - Introduction

Data Discovery - Introduction Data Discovery - Introduction Why (benefits of reusing data) How EUDAT's services help with this (in general) Adam Carter In days gone by: Design an experiment Getting Your Data Conduct the experiment

More information

irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam

irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam Agenda Introduction Challenges Data Transfer Solution irods use in Data Transfer Solution irods Proof-of-Concept Q&A Introduction

More information

Florida Coastal Everglades LTER Program

Florida Coastal Everglades LTER Program Florida Coastal Everglades LTER Program Metadata Workshop April 13, 2007 Linda Powell, FCE Information Manager Workshop Objectives I. Short Introduction to the FCE Metadata Policy What needs to be submitted

More information

Using DSpace for Digitized Collections. Lisa Spiro, Marie Wise, Sidney Byrd & Geneva Henry Rice University. Open Repositories 2007 January 23, 2007

Using DSpace for Digitized Collections. Lisa Spiro, Marie Wise, Sidney Byrd & Geneva Henry Rice University. Open Repositories 2007 January 23, 2007 Using DSpace for Digitized Collections Lisa Spiro, Marie Wise, Sidney Byrd & Geneva Henry Rice University Open Repositories 2007 January 23, 2007 Outline of presentation Rationale for choosing DSpace Rice

More information

Why Information Architecture is Vital for Effective Information Management. J. Kevin Parker, CIP, INFO CEO & Principal Architect at Kwestix

Why Information Architecture is Vital for Effective Information Management. J. Kevin Parker, CIP, INFO CEO & Principal Architect at Kwestix Why Information Architecture is Vital for Effective Information Management J. Kevin Parker, CIP, INFO CEO & Principal Architect at Kwestix J. Kevin Parker, CIP, INFO CEO & Principal Architect, Kwestix

More information

One Body, Many Heads for Repository-Powered Library Applications

One Body, Many Heads for Repository-Powered Library Applications One Body, Many Heads for Repository-Powered Library Applications Tom Cramer! Chief Technology Strategist! Stanford University Libraries!! CNI * 13 December 2011! Repositories make strange bedfellows University

More information

Workshop on Web Archiving

Workshop on Web Archiving Workshop on Web Archiving MODULE 1 A: WEB ARCHIVING Niels Brügger Asger Harlung Program, KB 10.40-11.50 Workshop part 1: Web archiving 11.50-12.10 Discussion and a short break before lunch 12.10-13.00

More information

Digital Preservation at NARA

Digital Preservation at NARA Digital Preservation at NARA Policy, Records, Technology Leslie Johnston Director of Digital Preservation US National Archives and Records Administration (NARA) ARMA, April 18, 2018 Policy Managing Government

More information

INTRODUCTION TO THE INTERNET. Presented by the Benbrook Public Library

INTRODUCTION TO THE INTERNET. Presented by the Benbrook Public Library INTRODUCTION TO THE INTERNET Presented by the Benbrook Public Library INTRODUCTION TO THE INTERNET What is the Internet? How did it come about? What is the World Wide Web? How does it work? How do I get

More information

Explorations of Netarkivet: Preliminary Findings. Emily Maemura, PhD Candidate Faculty of Information, University of Toronto NetLab Forum May 4, 2018

Explorations of Netarkivet: Preliminary Findings. Emily Maemura, PhD Candidate Faculty of Information, University of Toronto NetLab Forum May 4, 2018 Explorations of Netarkivet: Preliminary Findings Emily Maemura, PhD Candidate Faculty of Information, University of Toronto NetLab Forum May 4, 2018 Elements of Provenance Maemura, Worby, Milligan, Becker,

More information

Workshop B: Archiving the Web

Workshop B: Archiving the Web Workshop B: Archiving the Web Michael Day and Maureen Pennock Digital Curation Centre UKOLN, University of Bath http://www.ukoln.ac.uk/ Driving the Long-Term Preservation of Electronic Records, London,

More information

Virtualization. Q&A with an industry leader. Virtualization is rapidly becoming a fact of life for agency executives,

Virtualization. Q&A with an industry leader. Virtualization is rapidly becoming a fact of life for agency executives, Virtualization Q&A with an industry leader Virtualization is rapidly becoming a fact of life for agency executives, as the basis for data center consolidation and cloud computing and, increasingly, as

More information

Information Retrieval May 15. Web retrieval

Information Retrieval May 15. Web retrieval Information Retrieval May 15 Web retrieval What s so special about the Web? The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically

More information

Leveraging g Social Metadata

Leveraging g Social Metadata Leveraging g Social Metadata Karen Smith-Yoshimura smithyok@oclc.org 2009 OCLC Digital it Forum West Los Angeles, CA September 16-17, 2009 Metadata helps us find data. helps us understand the data we find.

More information

ChoreoSave: Determining metadata for digital dance preservation

ChoreoSave: Determining metadata for digital dance preservation ChoreoSave: Determining metadata for digital dance preservation Eugenia S. Kim Purdue University Libraries eugeniakim@ purdue.edu SAA Research Forum 2011 What is? Dance it is impossible to define - Curt

More information

HDP Security Overview

HDP Security Overview 3 HDP Security Overview Date of Publish: 2018-07-15 http://docs.hortonworks.com Contents HDP Security Overview...3 Understanding Data Lake Security... 3 What's New in This Release: Knox... 5 What's New

More information

HDP Security Overview

HDP Security Overview 3 HDP Security Overview Date of Publish: 2018-07-15 http://docs.hortonworks.com Contents HDP Security Overview...3 Understanding Data Lake Security... 3 What's New in This Release: Knox... 5 What's New

More information

Instance generation from meta-models (for model transformation testing)

Instance generation from meta-models (for model transformation testing) Instance generation from meta-models (for model transformation testing) Robbe De Jongh University of Antwerp Abstract Testing model transformations is a tedious job. One needs to make a representative

More information

Extending SOA Infrastructure for Semantic Interoperability

Extending SOA Infrastructure for Semantic Interoperability Extending SOA Infrastructure for Semantic Interoperability Wen Zhu wzhu@alionscience.com ITEA System of Systems Conference 26 Jan 2006 www.alionscience.com/semantic Agenda Background Semantic Mediation

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure EUDAT Towards a pan-european Collaborative Data Infrastructure Martin Hellmich Slides adapted from Damien Lecarpentier DCH-RP workshop, Manchester, 10 April 2013 Research Infrastructures Research Infrastructure

More information

Building a Data Catalog

Building a Data Catalog Building a Data Catalog Promoting Data Reuse and Collaboration at an Academic Medical Center Kevin Read, MLIS, MAS Alisa Surkis, PhD, MLIS EXTERNAL DATASETS 2 EXTERNAL DATASETS INTERNAL DATASETS 3 NYU

More information

University of California Curation Center Merritt Data User Agreements Rev Introduction. 2 Data user agreements

University of California Curation Center Merritt Data User Agreements Rev Introduction. 2 Data user agreements University of California Curation Center Merritt Data User Agreements Rev. 0.7 2013-01-27 1 Introduction Information technology and resources have become integral and indispensable to the pedagogic mission

More information

Big Data Issues for Federal Records Managers

Big Data Issues for Federal Records Managers Big Data Issues for Federal Records Managers ARMA Metro Conference April 26, 2017 Lisa Haralampus Director, Federal Records Management Policy and Outreach Section Office of the Chief Records Officer for

More information

Opportunities from Open Source Search

Opportunities from Open Source Search Opportunities from Open Source Search Wray Buntine Helsinki Institute for Information Technology September 21, 2005 1 Acknowledgements ALVIS project partners Ivana Podnar and P2P group at EPFL Ville Tuulos

More information

Collection Building on the Web. Basic Algorithm

Collection Building on the Web. Basic Algorithm Collection Building on the Web CS 510 Spring 2010 1 Basic Algorithm Initialize URL queue While more If URL is not a duplicate Get document with URL [Add to database] Extract, add to queue CS 510 Spring

More information

The Materials Data Facility

The Materials Data Facility The Materials Data Facility Ben Blaiszik (blaiszik@uchicago.edu), Kyle Chard (chard@uchicago.edu) Ian Foster (foster@uchicago.edu) materialsdatafacility.org What is MDF? We aim to make it simple for materials

More information

NATION WIDE WEBS. Jefferson Bailey, Director, Web Archiving & Data Services, Internet Archive IIPC WAC NLNZ 2018

NATION WIDE WEBS. Jefferson Bailey, Director, Web Archiving & Data Services, Internet Archive IIPC WAC NLNZ 2018 NATION WIDE WEBS Jefferson Bailey, Director, Web Archiving & Data Services, Internet Archive IIPC WAC NLNZ 2018 jefferson@archive.org NATION WHO WIDE WHAT WEBS WHY Jefferson Bailey, Director, Web Archiving

More information

Overview of the Netarkivet web archiving system

Overview of the Netarkivet web archiving system Overview of the Netarkivet web archiving system Lars R. Clausen Statsbiblioteket May 24, 2006 Abstract The Netarkivet web archiving system is creating to fulfill our obligation as national archives to

More information

Archivists Workbench: White Paper

Archivists Workbench: White Paper Archivists Workbench: White Paper Robin Chandler, Online Archive of California Bill Landis, University of California, Irvine Bradley Westbrook, University of California, San Diego 1 November 2001 Background

More information

Applying Auto-Data Classification Techniques for Large Data Sets

Applying Auto-Data Classification Techniques for Large Data Sets SESSION ID: PDAC-W02 Applying Auto-Data Classification Techniques for Large Data Sets Anchit Arora Program Manager InfoSec, Cisco The proliferation of data and increase in complexity 1995 2006 2014 2020

More information

Large Crawls of the Web for Linguistic Purposes

Large Crawls of the Web for Linguistic Purposes Large Crawls of the Web for Linguistic Purposes SSLMIT, University of Bologna Birmingham, July 2005 Outline Introduction 1 Introduction 2 3 Basics Heritrix My ongoing crawl 4 Filtering and cleaning 5 Annotation

More information

Custom Discovery Interface at the University of Michigan Library

Custom Discovery Interface at the University of Michigan Library University of Michigan Deep Blue deepblue.lib.umich.edu 2018-05-30 Custom Discovery Interface at the University of Michigan Library Varnum, Kenneth J. http://hdl.handle.net/2027.42/143857 Custom Discovery

More information

Building on to the Digital Preservation Foundation at Harvard Library. Andrea Goethals ABCD-Library Meeting June 27, 2016

Building on to the Digital Preservation Foundation at Harvard Library. Andrea Goethals ABCD-Library Meeting June 27, 2016 Building on to the Digital Preservation Foundation at Harvard Library Andrea Goethals ABCD-Library Meeting June 27, 2016 What do we already have? What do we still need? Where I ll focus DIGITAL PRESERVATION

More information

Still All on One Server: Perforce at Scale

Still All on One Server: Perforce at Scale Still All on One Server: Perforce at Scale Dan Bloch Senior Site Reliability Engineer Google Inc. June 3, 2011 GOOGLE Google's mission: Organize the world's information and make it universally accessible

More information

Different Aspects of Digital Preservation

Different Aspects of Digital Preservation Different Aspects of Digital Preservation DCH-RP and EUDAT Workshop in Stockholm 3rd of June 2014 Börje Justrell Table of Content Definitions Strategies The Digital Archive Lifecycle 2 Digital preservation

More information

The MDR: A Grand Experiment in Storage & Preservation

The MDR: A Grand Experiment in Storage & Preservation The MDR: A Grand Experiment in Storage & Preservation Agenda Overview of the IA Web Archive MDR What is it and why deploy it? Before & After: Philosophy & Best Practices Wayback Access Services What s

More information

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY Data Staging and Data Movement with EUDAT Course Introduction Helsinki 10 th -12 th September, 2013 1400 Introduction Adam Carter Course Timetable TODAY 1430 The EUDAT Safe Replication Service Claudio

More information

A Mashup-Based Strategy for Migration to Web 2.0

A Mashup-Based Strategy for Migration to Web 2.0 A Mashup-Based Strategy for Migration to Web 2.0 Dr. Semih Çetin A Mashup-Based Strategy for Migration to Web 2.0 1 Content Statement of the problem and motivation Existing technologies and approaches

More information

Archivierung und Publikation von Forschungsdaten mit RADAR

Archivierung und Publikation von Forschungsdaten mit RADAR Archivierung und Publikation von Forschungsdaten mit RADAR Matthias Razum FIZ Karlsruhe Leibniz-Institut für Informationsinfrastruktur funded by RADAR IN A NUTSHELL RADAR = Research Data Repository Goal:

More information

Establishing Trust in Data Curation: OAIS and TRAC applied to a Data Staging Repository (DataStaR)

Establishing Trust in Data Curation: OAIS and TRAC applied to a Data Staging Repository (DataStaR) Establishing Trust in Data Curation: OAIS and TRAC applied to a Data Staging Repository (DataStaR) Gail Steinhart Cornell University Library Ann Green Digital Life Cycle Research & Consulting Dianne Dietrich

More information

Curators Needed How Public Libraries are Bringing Community Members into their Web Archiving Practice

Curators Needed How Public Libraries are Bringing Community Members into their Web Archiving Practice Curators Needed How Public Libraries are Bringing Community Members into their Web Archiving Practice Karl-Rainer Blumenthal, Internet Archive Emily Ward, East Baton Rouge Parish Library Natalie Milbrodt,

More information

Accessing Web Archives

Accessing Web Archives Accessing Web Archives Web Science Course 2017 Helge Holzmann 05/16/2017 Helge Holzmann (holzmann@l3s.de) Not today s topic http://blog.archive.org/2016/09/19/the-internet-archive-turns-20/ 05/16/2017

More information

IMF: What is IMF? What does the new mastering format mean to you? A newbies guide 04/03/18

IMF: What is IMF? What does the new mastering format mean to you? A newbies guide 04/03/18 IMF: What does the new mastering format mean to you? What is IMF? A newbies guide Mark Harrison (as Horton) & Bruce Devlin (as Dr. Seuss Bruce) 1 What is IMF? Is it a file? No! It s not a file And we don

More information

Big Data infrastructure and tools in libraries

Big Data infrastructure and tools in libraries Line Pouchard, PhD Purdue University Libraries Research Data Group Big Data infrastructure and tools in libraries 08/10/2016 DATA IN LIBRARIES: THE BIG PICTURE IFLA/ UNIVERSITY OF CHICAGO BIG DATA: A VERY

More information

Building Effective CyberGIS: FutureGrid. Marlon Pierce, Geoffrey Fox Indiana University

Building Effective CyberGIS: FutureGrid. Marlon Pierce, Geoffrey Fox Indiana University Building Effective CyberGIS: FutureGrid Marlon Pierce, Geoffrey Fox Indiana University Some Worthy Characteristics of CyberGIS Open Services, algorithms, data, standards, infrastructure Reproducible Can

More information

DATAVERSE FOR JOURNALS

DATAVERSE FOR JOURNALS DATAVERSE FOR JOURNALS Mercè Crosas, Ph.D. Director of Data Science IQSS, Harvard University @mercecrosas Society for Scholarly Publishing 37 th Meeting, 28, May, 2015 About Dataverse Science requires

More information