CDL s Web Archiving System
|
|
- Ralph Atkinson
- 5 years ago
- Views:
Transcription
1 CDL s Web Archiving System Erik Hetzner UC3, California Digital Library 16 June 2011 Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
2 Introduction We don t decide what to collect. We don t decide when to collect it. We build tools to allow curators to make those decisions. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
3 Introduction Vital statistics 35 public archives 16 partners 2724 web sites 289,272,095 URLs ( 2) 16.1 TB ( 2) Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
4 Introduction Vital statistics 35 public archives 16 partners 2724 web sites 289,272,095 URLs ( 2) 16.1 TB ( 2) Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
5 Introduction Vital statistics 35 public archives 16 partners 2724 web sites 289,272,095 URLs ( 2) 16.1 TB ( 2) Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
6 Introduction Vital statistics 35 public archives 16 partners 2724 web sites 289,272,095 URLs ( 2) 16.1 TB ( 2) Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
7 Introduction Vital statistics 35 public archives 16 partners 2724 web sites 289,272,095 URLs ( 2) 16.1 TB ( 2) Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
8 Introduction The Web Archiving Service Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
9 Introduction Archive Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
10 Introduction Search Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
11 Introduction Site list Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
12 Introduction Archived page Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
13 How we do it Collection focus (unofficial) Middle East political sites (Stanford) Social movements (Tamiment, NYU) California government sites (UC) Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
14 How we do it Tools Heritrix 1.14.x Open-source Wayback Nutchwax (moving to Solr) CDL s legacy Digital Preservation Repository... and a lot of UI code... ARC management, indexing scripts, etc. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
15 Difficulties Web archiving is easy*, but there are some difficulties. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
16 Difficulties Uneven coverage We only crawl what our curators select. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
17 Difficulties Human selection High precision; low recall. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
18 Difficulties Scale We are not Internet Archive scale: but we are big enough that it takes a long time to do anything. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
19 Difficulties Collection mismatch Our crawls are organized into collections. Everybody [?] else has one big archive. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
20 Difficulties Politics We are customer-driven: we need to convince customers that collaboration is good for them. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
21 Possibilities What s on our plate Deduplication... requires a new index (Solr) Moving to our new Merritt repository Implementing Memento Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
22 Possibilities What s on our plate Deduplication... requires a new index (Solr) Moving to our new Merritt repository Implementing Memento Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
23 Possibilities What s on our plate Deduplication... requires a new index (Solr) Moving to our new Merritt repository Implementing Memento Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
24 Possibilities What s on our plate Deduplication... requires a new index (Solr) Moving to our new Merritt repository Implementing Memento Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
25 Possibilities Evaluating community needs What do we have that you need? What do you have that we need? Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
26 Possibilities Collaboration with researchers The hard, fun problems are not necessarily the ones that we need to be solved. But maybe we can work it out. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
27 Possibilities Temporal search How can we rank (and display) results across time? Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
28 Possibilities Standards Standards for sharing, or providing computational access to, metadata or full content. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
29 Possibilities The changing web Flash and HTML5 throw a monkeywrench in the web. Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
30 Possibilities Cross-archive collections There is no reason why our curators should only be using our crawls. How can we build collections that span archives? Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
31 Possibilities CDL s Web Archiving Service We build tools; curators build collections. We are ready to be part of a global web archive infrastructure. What next? Thanks for having me, and thanks for listening. erik.hetzner@ucop.edu Erik Hetzner (UC3, California Digital Library) CDL s Web Archiving System 16 June / 24
Archiving and Preserving the Web. Kristine Hanna Internet Archive November 2006
Archiving and Preserving the Web Kristine Hanna Internet Archive November 2006 1 About Internet Archive Non profit founded in 1996 by Brewster Kahle, as an Internet library Provide universal and permanent
More informationCapturing our State: The California State Government Web Archive
Capturing our State: The California State An Infopeople Hosted Webinar April 24, 2017 This work is licensed under the CreaBve Commons AMribuBon-NonCommercial 4.0 InternaBonal License. To view a copy of
More informationThe Web-at-Risk at Three: Overview of an NDIIPP Web Archiving Initiative
The Web-at-Risk at Three: Overview of an NDIIPP Web Archiving Initiative Tracy Seneca Abstract The Web-at-Risk project is a multi-year National Digital Information Infrastructure and Preservation Program
More informationA Micro-Services-Based Approach for Curation and Preservation Solutions
Global Oracle PASIG User Group Meeting, Redwood Shores, May 10-12, 2011 A Micro-Services-Based Approach for Curation and Preservation Solutions Stephen Abrams Patricia Cruse John Kunze Perry Willett University
More informationLeveraging High Performance Computing Infrastructure for Trusted Digital Preservation
Leveraging High Performance Computing Infrastructure for Trusted Digital Preservation 12 December 2007 Digital Curation Conference Washington D.C. Richard Moore Director of Production Systems San Diego
More informationSystems Interoperability and Collaborative Development for Web Archiving
Systems Interoperability and Collaborative Development for Web Archiving Filling Gaps in the IMLS National Digital Platform Mark Phillips, University of North Texas Courtney Mumma, Internet Archive Talk
More informationWeb Archiving Workshop
Web Archiving Workshop Mark Phillips Texas Conference on Digital Libraries June 4, 2008 Agenda 1:00 Welcome/Introductions 1:15 Introduction to Web Archiving History Concepts/Terms Examples 2:15 Collection
More informationPreserving Legal Blogs
Preserving Legal Blogs Georgetown Law School Linda Frueh Internet Archive July 25, 2009 1 Contents 1. Intro to the Internet Archive All media The Web Archive 2. Where do blogs fit? 3. How are blogs collected?
More information2008 DOT GOV HARVEST PRESERVING ACCESS
2008 DOT GOV HARVEST PRESERVING ACCESS Cathy N. Hartman Mark E. Phillips FDLC Oct 21, 2008 UNIVERSITY OF NORTH TEXAS LIBRARIES Outline Project History Tool Building Partner Activities Future Work Project
More informationMetadata Sharing Policy
University of California Libraries Metadata Sharing Policy Lisa Rowlison de Ortiz UC Berkeley, June 2015 Why is the Policy important? Establishes a major area for collaboration Who would make use of shared
More informationBecoming a Web Archivist: My 10 Year Journey in the National Library of Estonia
Becoming a Web Archivist: My 10 Year Journey in the National Library of Estonia Tiiu Daniel National Library of Estonia IIPC Web Archiving Conference, New Zealand, Wellington November 13, 2018 You can't
More informationArchiving the Web: What can Institutions learn from National and International Web Archiving Initiatives
Archiving the Web: What can Institutions learn from National and International Web Archiving Initiatives Maureen Pennock Michael Day Lizzie Richmond UKOLN University of Bath UKOLN University of Bath University
More informationUC Irvine LAUC-I and Library Staff Research
UC Irvine LAUC-I and Library Staff Research Title Research Data Management: Local UCI Outreach to Faculty Permalink https://escholarship.org/uc/item/18f3v1j7 Author Tsang, Daniel C Publication Date 2013-02-25
More informationWeb Archiving at UTL
Web Archiving at UTL iskills workshops February 2018 Sam-chin Li Reference and Government Information Librarian, UTL Nich Worby Government Information and Statistics Librarian, UTL Agenda What is web archiving
More informationPreserving Public Government Information: The 2008 End of Term Crawl Project
Preserving Public Government Information: The 2008 End of Term Crawl Project Abbie Grotke, Library of Congress Mark Phillips, University of North Texas Libraries George Barnum, U.S. Government Printing
More informationDigital Repository Development at Yale University Library
Digital Repository Development at Yale University Library Michael Dula CTO, Yale University Library December 8, 2014 Library Information Technology at Yale Four groups comprise Library IT: Workstation
More informationNDSA Web Archiving Survey
NDSA Web Archiving Survey Introduction In 2011 and 2013, the National Digital Stewardship Alliance (NDSA) conducted surveys of U.S. organizations currently or prospectively engaged in web archiving to
More informationFrom Web Page Storage to Living Web Archives Thomas Risse
From Web Page Storage to Living Web Archives Thomas Risse JISC, the DPC and the UK Web Archiving Consortium Workshop British Library, London, 21.7.2009 1 Agenda Web Crawlingtoday& Open Issues LiWA Living
More informationHarvesting Democracy: Archiving Federal Government Web Content at End of Term
Harvesting Democracy: Archiving Federal Government Web Content at End of Term Jefferson Bailey, Director, Web Archiving, Internet Archive @jefferson_bail jefferson@archive.org Abbie Grotke, Web Archiving
More informationOn the Change in Archivability of Websites Over Time
Old Dominion University ODU Digital Commons Computer Science Presentations Computer Science 9-23-2013 On the Change in Archivability of Websites Over Time Mat Kelly Old Dominion University Justin F. Brunelle
More informationWeb archiving: practices and options for academic libraries
Web archiving: practices and options for academic libraries Helen Hockx-Yu, Director of Global Web Services, Internet Archive @hhockx helen@archive.org Outline Web archiving basics Web archiving landscape
More informationStrategy for long term preservation of material collected for the Netarchive by the Royal Library and the State and University Library 2014
Strategy for long term preservation of material collected for the Netarchive by the Royal Library and the State and University Library 2014 Introduction This document presents a strategy for long term
More information1. Name of Your Organization. 2. About Your Organization. Page 1. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj. nmlkj.
In Fall 2011, the National Digital Stewardship Alliance (NDSA) conducted a survey of U.S. organizations currently or prospectively engaged in web archiving to better understand the landscape: similarities
More informationFrom Web Page Storage to Living Web Archives
From Web Page Storage to Living Web Archives Thomas Risse JISC, the DPC and the UK Web Archiving Consortium Workshop British Library, London, 21.7.2009 1 Agenda Web Crawling today & Open Issues LiWA Living
More informationWayback for Accessing Web Archives
Wayback for Accessing Web Archives ABSTRACT 'Wayback' is an open-source, Java software package for browserbased access of archived web material, offering a variety of operation modes and opportunities
More informationEUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members
EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members Agenda Background information Services Common Data Infrastructure
More informationUtilizing Digital Library Infrastructure to Build Modern Research Collections
Utilizing Digital Library Infrastructure to Build Modern Research Collections Mark Phillips Wisconsin Association of Academic Librarians What is a digital library? 2 After 15+ years surely we've figured
More informationWEB ARCHIVE COLLECTING POLICY
WEB ARCHIVE COLLECTING POLICY Purdue University Libraries Virginia Kelly Karnes Archives and Special Collections Research Center 504 West State Street West Lafayette, Indiana 47907-2058 (765) 494-2839
More informationContrary to popular belief, elephants do not provide good relevancy tests. Nor do cats.
Contrary to popular belief, elephants do not provide good relevancy tests. Nor do cats. h"p://daisythecurlycat.blogspot.com/2009/03/elephant mancat.html Practical Relevancy Testing Naomi Dushay Stanford
More informationWeb-Archiving: Collecting and Preserving Important Web-based National Resources
Web-Archiving: Collecting and Preserving Important Web-based National Resources Mark Phillips Dr. Daniel Gelaw Alemneh University of North Texas UNT Libraries The Web is the platform for communication
More informationThe Smithsonian/NASA Astrophysics Data System
The Smithsonian/NASA Astrophysics Data System Status Report Alberto Accomazzi Michael Kurtz Harvard-Smithsonian Center for Astrophysics http://ads.harvard.edu The ADS Project Established in 1989 (before
More informationAutomatic Processing.
Automatic Processing fernando.melo@fccn.pt Tutorial outline Research the Past Web using Web archives 1. Search and access The Past Web: examples and use cases Public online services 2. Publish and preserve
More informationInformation Retrieval Spring Web retrieval
Information Retrieval Spring 2016 Web retrieval The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically infinite due to the dynamic
More informationCAT (CURATOR ARCHIVING TOOL): IMPROVING ACCESS TO WEB ARCHIVES
CAT (CURATOR ARCHIVING TOOL): IMPROVING ACCESS TO WEB ARCHIVES Ciro Llueca, Daniel Cócera Biblioteca de Catalunya (BC) Barcelona, Spain padicat@bnc.cat Natalia Torres, Gerard Suades, Ricard de la Vega
More informationNow let s take a look
1 2 3 4 Manage assets across the end to end life cycle of your studies This includes forms, datasets, terminologies, files, links and more, for example: - Studies may contain the protocol, a set of Forms,
More informationIntroduction to Islandora Kim Pham, Digital Projects & Technologies Librarian (UTSC) Kelli Babcock, Digital Initiatives Librarian (UTL)
Introduction to Islandora 2018.02.08 Kim Pham, Digital Projects & Technologies Librarian (UTSC) Kelli Babcock, Digital Initiatives Librarian (UTL) First! Login to your computer. Open Chrome. Go to https://goo.gl/rvhrz8
More informationMedici for Digital Cultural Heritage Libraries. George Tsouloupas, PhD The LinkSCEEM Project
Medici for Digital Cultural Heritage Libraries George Tsouloupas, PhD The LinkSCEEM Project Overview of Digital Libraries A Digital Library: "An informal definition of a digital library is a managed collection
More informationPreservation of Web Materials
Preservation of Web Materials Julie Dietrich INFO 560 Literature Review 7/20/13 1 Introduction Websites are a communication and informational tool that can be shared and updated across the World Wide Web.
More informationNéonaute: mining web archives for linguistic analysis
Néonaute: mining web archives for linguistic analysis Sara Aubry, Bibliothèque nationale de France Emmanuel Cartier, LIPN, University of Paris 13 Peter Stirling, Bibliothèque nationale de France IIPC Web
More informationScience-as-a-Service
Science-as-a-Service The iplant Foundation Rion Dooley Edwin Skidmore Dan Stanzione Steve Terry Matthew Vaughn Outline Why, why, why! When duct tape isn t enough Building an API for the web Core services
More informationC4: Library, Archives, Museum, and more: connecting for improved heritage services in the City of Burnaby
C4: Library, Archives, Museum, and more: connecting for improved heritage services in the City of Burnaby Kathy Bryce, Partner, Andornot Consulting Inc. and Arilea Sill, IAP Administrator, Government of
More informationDrupal for Virtual Learning And Higher Education
Drupal for Virtual Learning And Higher Education Next generation virtual learning Most Virtual Learning solutions include at least the following: - a repository of learning objects: various resources used
More informationDSpace and Web Material: Inroads and Challenges. Leslie Myrick, NYU DLF Spring Forum April 15, 2005
DSpace and Web Material: Inroads and Challenges Leslie Myrick, NYU DLF Spring Forum April 15, 2005 What I ll Be Covering NDIIPP Web at Risk Project Web Archive Data Object Modeling DSpace and HTML Issues,
More informationEUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT
EUDAT A European Collaborative Data Infrastructure Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT OpenAire Interoperability Workshop Braga, Feb. 8, 2013 EUDAT Key facts
More informationSOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES
SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES Introduction to Information Retrieval CS 150 Donald J. Patterson This content based on the paper located here: http://dx.doi.org/10.1007/s10618-008-0118-x
More informationCommunity Tools and Best Practices for Harvesting and Preserving At-Risk Web Content ACA 2013
Community Tools and Best Practices for Harvesting and Preserving At-Risk Web Content ACA 2013 Scott Reed, Internet Archive Amanda Wakaruk, University of Alberta Libraries Kelly E. Lau, University of Alberta
More informationSAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC
SAP Agile Data Preparation Simplify the Way You Shape Data Introduction SAP Agile Data Preparation Overview Video SAP Agile Data Preparation is a self-service data preparation application providing data
More informationDo-It-Yourself Data Migration
Do-It-Yourself Data Migration Dr. Paul Dorsey & Michael Rosenblum Dulcian, Inc. June 5, 2012 1 of 29 Been Around FOREVER Who Am I? - Paul Spoke at almost every big Oracle conference since 93 First inductee
More informationIslandora and Fedora 4; The Atonement v3: The Atonermenter
Islandora and Fedora 4; The Atonement v3: The Atonermenter Project history and background Fedora 4 Interest Group Thank you to our sponsors: Atonement One Repo to rule them all, One Repo to find them,
More informationHow to Register at the AMC Member Center and Select Electronic Delivery
How to Register at the AMC Member Center and Select Electronic Delivery By Jeff Carden Registering with the AMC s Member Center at http://www.outdoors.org/membership/member-center.cfm provides you with
More informationSupporting C2 Research and Evaluation: An Infrastructure and its Potential Impact
Supporting C2 Research and Evaluation: An Infrastructure and its Potential Impact James Law, Ph.D. and Marion Ceruti, Ph.D. Space and Naval Warfare Systems Center Pacific (SSC Pacific) 16th ICCRTS, Quebec
More informationLinked Data and Libraries
Linked Data and Libraries American Library Association Grassroots Program: From Legacy Data to Linked Data Preparing Libraries for Web 3.0 Chicago, IL July 13, 2009 Eric Miller em@zepheira.com Copyright
More informationDigging into File Formats: Poking around at data using file, DROID, JHOVE, and more
Digging into File Formats: Poking around at data using file, DROID, JHOVE, and more Presented by Stephen Eisenhauer UNT Libraries TechTalks February 12, 2014 Why? We handle a lot of digital information
More informationIts All About The Metadata
Best Practices Exchange 2013 Its All About The Metadata Mark Evans - Digital Archiving Practice Manager 11/13/2013 Agenda Why Metadata is important Metadata landscape A flexible approach Case study - KDLA
More informationData Discovery - Introduction
Data Discovery - Introduction Why (benefits of reusing data) How EUDAT's services help with this (in general) Adam Carter In days gone by: Design an experiment Getting Your Data Conduct the experiment
More informationirods for Data Management and Archiving UGM 2018 Masilamani Subramanyam
irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam Agenda Introduction Challenges Data Transfer Solution irods use in Data Transfer Solution irods Proof-of-Concept Q&A Introduction
More informationFlorida Coastal Everglades LTER Program
Florida Coastal Everglades LTER Program Metadata Workshop April 13, 2007 Linda Powell, FCE Information Manager Workshop Objectives I. Short Introduction to the FCE Metadata Policy What needs to be submitted
More informationUsing DSpace for Digitized Collections. Lisa Spiro, Marie Wise, Sidney Byrd & Geneva Henry Rice University. Open Repositories 2007 January 23, 2007
Using DSpace for Digitized Collections Lisa Spiro, Marie Wise, Sidney Byrd & Geneva Henry Rice University Open Repositories 2007 January 23, 2007 Outline of presentation Rationale for choosing DSpace Rice
More informationWhy Information Architecture is Vital for Effective Information Management. J. Kevin Parker, CIP, INFO CEO & Principal Architect at Kwestix
Why Information Architecture is Vital for Effective Information Management J. Kevin Parker, CIP, INFO CEO & Principal Architect at Kwestix J. Kevin Parker, CIP, INFO CEO & Principal Architect, Kwestix
More informationOne Body, Many Heads for Repository-Powered Library Applications
One Body, Many Heads for Repository-Powered Library Applications Tom Cramer! Chief Technology Strategist! Stanford University Libraries!! CNI * 13 December 2011! Repositories make strange bedfellows University
More informationWorkshop on Web Archiving
Workshop on Web Archiving MODULE 1 A: WEB ARCHIVING Niels Brügger Asger Harlung Program, KB 10.40-11.50 Workshop part 1: Web archiving 11.50-12.10 Discussion and a short break before lunch 12.10-13.00
More informationDigital Preservation at NARA
Digital Preservation at NARA Policy, Records, Technology Leslie Johnston Director of Digital Preservation US National Archives and Records Administration (NARA) ARMA, April 18, 2018 Policy Managing Government
More informationINTRODUCTION TO THE INTERNET. Presented by the Benbrook Public Library
INTRODUCTION TO THE INTERNET Presented by the Benbrook Public Library INTRODUCTION TO THE INTERNET What is the Internet? How did it come about? What is the World Wide Web? How does it work? How do I get
More informationExplorations of Netarkivet: Preliminary Findings. Emily Maemura, PhD Candidate Faculty of Information, University of Toronto NetLab Forum May 4, 2018
Explorations of Netarkivet: Preliminary Findings Emily Maemura, PhD Candidate Faculty of Information, University of Toronto NetLab Forum May 4, 2018 Elements of Provenance Maemura, Worby, Milligan, Becker,
More informationWorkshop B: Archiving the Web
Workshop B: Archiving the Web Michael Day and Maureen Pennock Digital Curation Centre UKOLN, University of Bath http://www.ukoln.ac.uk/ Driving the Long-Term Preservation of Electronic Records, London,
More informationVirtualization. Q&A with an industry leader. Virtualization is rapidly becoming a fact of life for agency executives,
Virtualization Q&A with an industry leader Virtualization is rapidly becoming a fact of life for agency executives, as the basis for data center consolidation and cloud computing and, increasingly, as
More informationInformation Retrieval May 15. Web retrieval
Information Retrieval May 15 Web retrieval What s so special about the Web? The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically
More informationLeveraging g Social Metadata
Leveraging g Social Metadata Karen Smith-Yoshimura smithyok@oclc.org 2009 OCLC Digital it Forum West Los Angeles, CA September 16-17, 2009 Metadata helps us find data. helps us understand the data we find.
More informationChoreoSave: Determining metadata for digital dance preservation
ChoreoSave: Determining metadata for digital dance preservation Eugenia S. Kim Purdue University Libraries eugeniakim@ purdue.edu SAA Research Forum 2011 What is? Dance it is impossible to define - Curt
More informationHDP Security Overview
3 HDP Security Overview Date of Publish: 2018-07-15 http://docs.hortonworks.com Contents HDP Security Overview...3 Understanding Data Lake Security... 3 What's New in This Release: Knox... 5 What's New
More informationHDP Security Overview
3 HDP Security Overview Date of Publish: 2018-07-15 http://docs.hortonworks.com Contents HDP Security Overview...3 Understanding Data Lake Security... 3 What's New in This Release: Knox... 5 What's New
More informationInstance generation from meta-models (for model transformation testing)
Instance generation from meta-models (for model transformation testing) Robbe De Jongh University of Antwerp Abstract Testing model transformations is a tedious job. One needs to make a representative
More informationExtending SOA Infrastructure for Semantic Interoperability
Extending SOA Infrastructure for Semantic Interoperability Wen Zhu wzhu@alionscience.com ITEA System of Systems Conference 26 Jan 2006 www.alionscience.com/semantic Agenda Background Semantic Mediation
More informationEUDAT. Towards a pan-european Collaborative Data Infrastructure
EUDAT Towards a pan-european Collaborative Data Infrastructure Martin Hellmich Slides adapted from Damien Lecarpentier DCH-RP workshop, Manchester, 10 April 2013 Research Infrastructures Research Infrastructure
More informationBuilding a Data Catalog
Building a Data Catalog Promoting Data Reuse and Collaboration at an Academic Medical Center Kevin Read, MLIS, MAS Alisa Surkis, PhD, MLIS EXTERNAL DATASETS 2 EXTERNAL DATASETS INTERNAL DATASETS 3 NYU
More informationUniversity of California Curation Center Merritt Data User Agreements Rev Introduction. 2 Data user agreements
University of California Curation Center Merritt Data User Agreements Rev. 0.7 2013-01-27 1 Introduction Information technology and resources have become integral and indispensable to the pedagogic mission
More informationBig Data Issues for Federal Records Managers
Big Data Issues for Federal Records Managers ARMA Metro Conference April 26, 2017 Lisa Haralampus Director, Federal Records Management Policy and Outreach Section Office of the Chief Records Officer for
More informationOpportunities from Open Source Search
Opportunities from Open Source Search Wray Buntine Helsinki Institute for Information Technology September 21, 2005 1 Acknowledgements ALVIS project partners Ivana Podnar and P2P group at EPFL Ville Tuulos
More informationCollection Building on the Web. Basic Algorithm
Collection Building on the Web CS 510 Spring 2010 1 Basic Algorithm Initialize URL queue While more If URL is not a duplicate Get document with URL [Add to database] Extract, add to queue CS 510 Spring
More informationThe Materials Data Facility
The Materials Data Facility Ben Blaiszik (blaiszik@uchicago.edu), Kyle Chard (chard@uchicago.edu) Ian Foster (foster@uchicago.edu) materialsdatafacility.org What is MDF? We aim to make it simple for materials
More informationNATION WIDE WEBS. Jefferson Bailey, Director, Web Archiving & Data Services, Internet Archive IIPC WAC NLNZ 2018
NATION WIDE WEBS Jefferson Bailey, Director, Web Archiving & Data Services, Internet Archive IIPC WAC NLNZ 2018 jefferson@archive.org NATION WHO WIDE WHAT WEBS WHY Jefferson Bailey, Director, Web Archiving
More informationOverview of the Netarkivet web archiving system
Overview of the Netarkivet web archiving system Lars R. Clausen Statsbiblioteket May 24, 2006 Abstract The Netarkivet web archiving system is creating to fulfill our obligation as national archives to
More informationArchivists Workbench: White Paper
Archivists Workbench: White Paper Robin Chandler, Online Archive of California Bill Landis, University of California, Irvine Bradley Westbrook, University of California, San Diego 1 November 2001 Background
More informationApplying Auto-Data Classification Techniques for Large Data Sets
SESSION ID: PDAC-W02 Applying Auto-Data Classification Techniques for Large Data Sets Anchit Arora Program Manager InfoSec, Cisco The proliferation of data and increase in complexity 1995 2006 2014 2020
More informationLarge Crawls of the Web for Linguistic Purposes
Large Crawls of the Web for Linguistic Purposes SSLMIT, University of Bologna Birmingham, July 2005 Outline Introduction 1 Introduction 2 3 Basics Heritrix My ongoing crawl 4 Filtering and cleaning 5 Annotation
More informationCustom Discovery Interface at the University of Michigan Library
University of Michigan Deep Blue deepblue.lib.umich.edu 2018-05-30 Custom Discovery Interface at the University of Michigan Library Varnum, Kenneth J. http://hdl.handle.net/2027.42/143857 Custom Discovery
More informationBuilding on to the Digital Preservation Foundation at Harvard Library. Andrea Goethals ABCD-Library Meeting June 27, 2016
Building on to the Digital Preservation Foundation at Harvard Library Andrea Goethals ABCD-Library Meeting June 27, 2016 What do we already have? What do we still need? Where I ll focus DIGITAL PRESERVATION
More informationStill All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Dan Bloch Senior Site Reliability Engineer Google Inc. June 3, 2011 GOOGLE Google's mission: Organize the world's information and make it universally accessible
More informationDifferent Aspects of Digital Preservation
Different Aspects of Digital Preservation DCH-RP and EUDAT Workshop in Stockholm 3rd of June 2014 Börje Justrell Table of Content Definitions Strategies The Digital Archive Lifecycle 2 Digital preservation
More informationThe MDR: A Grand Experiment in Storage & Preservation
The MDR: A Grand Experiment in Storage & Preservation Agenda Overview of the IA Web Archive MDR What is it and why deploy it? Before & After: Philosophy & Best Practices Wayback Access Services What s
More informationData Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY
Data Staging and Data Movement with EUDAT Course Introduction Helsinki 10 th -12 th September, 2013 1400 Introduction Adam Carter Course Timetable TODAY 1430 The EUDAT Safe Replication Service Claudio
More informationA Mashup-Based Strategy for Migration to Web 2.0
A Mashup-Based Strategy for Migration to Web 2.0 Dr. Semih Çetin A Mashup-Based Strategy for Migration to Web 2.0 1 Content Statement of the problem and motivation Existing technologies and approaches
More informationArchivierung und Publikation von Forschungsdaten mit RADAR
Archivierung und Publikation von Forschungsdaten mit RADAR Matthias Razum FIZ Karlsruhe Leibniz-Institut für Informationsinfrastruktur funded by RADAR IN A NUTSHELL RADAR = Research Data Repository Goal:
More informationEstablishing Trust in Data Curation: OAIS and TRAC applied to a Data Staging Repository (DataStaR)
Establishing Trust in Data Curation: OAIS and TRAC applied to a Data Staging Repository (DataStaR) Gail Steinhart Cornell University Library Ann Green Digital Life Cycle Research & Consulting Dianne Dietrich
More informationCurators Needed How Public Libraries are Bringing Community Members into their Web Archiving Practice
Curators Needed How Public Libraries are Bringing Community Members into their Web Archiving Practice Karl-Rainer Blumenthal, Internet Archive Emily Ward, East Baton Rouge Parish Library Natalie Milbrodt,
More informationAccessing Web Archives
Accessing Web Archives Web Science Course 2017 Helge Holzmann 05/16/2017 Helge Holzmann (holzmann@l3s.de) Not today s topic http://blog.archive.org/2016/09/19/the-internet-archive-turns-20/ 05/16/2017
More informationIMF: What is IMF? What does the new mastering format mean to you? A newbies guide 04/03/18
IMF: What does the new mastering format mean to you? What is IMF? A newbies guide Mark Harrison (as Horton) & Bruce Devlin (as Dr. Seuss Bruce) 1 What is IMF? Is it a file? No! It s not a file And we don
More informationBig Data infrastructure and tools in libraries
Line Pouchard, PhD Purdue University Libraries Research Data Group Big Data infrastructure and tools in libraries 08/10/2016 DATA IN LIBRARIES: THE BIG PICTURE IFLA/ UNIVERSITY OF CHICAGO BIG DATA: A VERY
More informationBuilding Effective CyberGIS: FutureGrid. Marlon Pierce, Geoffrey Fox Indiana University
Building Effective CyberGIS: FutureGrid Marlon Pierce, Geoffrey Fox Indiana University Some Worthy Characteristics of CyberGIS Open Services, algorithms, data, standards, infrastructure Reproducible Can
More informationDATAVERSE FOR JOURNALS
DATAVERSE FOR JOURNALS Mercè Crosas, Ph.D. Director of Data Science IQSS, Harvard University @mercecrosas Society for Scholarly Publishing 37 th Meeting, 28, May, 2015 About Dataverse Science requires
More information