Below is an example workflow of file inventorying at the American Geographical Society Library at UWM Libraries.

Similar documents
Quick Reference Using Freely Available Tools to Protect Your Digital Content

Manual Physical Inventory Upload Created on 3/17/2017 7:37:00 AM

OPENING A LEADS.TXT FILE IN EXCEL 2010

Exploring the Microsoft Access User Interface and Exploring Navicat and Sequel Pro, and refer to chapter 5 of The Data Journalist.

How to Import a Text File into Gorilla 4

Identifying Updated Metadata and Images from a Content Provider

Digging into File Formats: Poking around at data using file, DROID, JHOVE, and more

CSV Import Guide. Public FINAL V

Interfacing with MS Office Conference 2017

Reference Guide. Adding a Generic File Store - Importing From a Local or Network ShipWorks Page 1 of 21

DupScout DUPLICATE FILES FINDER

Session 10 MS Word. Mail Merge

Tab-Delimited File and Compound Objects - Documents, Postcards, and Cubes. (Not Monographs)

Part 2 Uploading and Working with WebCT's File Manager and Student Management INDEX

Copyright 2009 Real Freedom, Inc.

Microsoft Access Database How to Import/Link Data

University of North Dakota PeopleSoft Finance Tip Sheets. Utilizing the Query Download Feature

Importing a txt or csv file into ArcGIS Online [AGO]

Talend Open Studio for Data Quality. User Guide 5.5.2

User Manual. Administrator s guide for mass managing VirtueMart products. using. VM Mass Update 1.0

WorldShip Import Customer Addresses. Table of Contents

Ektron Advanced. Learning Objectives. Getting Started

Exporting data into text files and importing text files into Access 2007 Jour 772 & 472 / Chinoy

General Guidelines: SAS Analyst

Creating Codes with Spreadsheet Upload

Microsoft Access 2013

IMPORTING A STUDENT LIST FROM SYNERGY INTO A GOOGLE CONTACT LIST

How to Import Part Numbers to Proman

Creating a Blackboard Test or Question Pool with Respondus

CRM CUSTOMER RELATIONSHIP MANAGEMENT

Adobe Dreamweaver CC 17 Tutorial

Office 2016 Excel Basics 25 Video/Class Project #37 Excel Basics 25: Power Query (Get & Transform Data) to Convert Bad Data into Proper Data Set

Where Did My Files Go? How to find your files using Windows 10

Mail Merge - Create Letter

Business Online TM. Positive Pay - Adding Issued Items. Quick Reference Guide

Accession Procedures Born-Digital Materials Workflow

Budget Reports for All Users

PracticeMaster Report Writer Guide

Membership Application Mailmerge

MicroStrategy Desktop

Calendar Guide: Exchange (Outlook) -> Google. How to manually transfer your Exchange (Outlook) calendar over to Google Calendar

Fixity. Note This application is in beta. Please help refine it further by reporting all bugs to

Multipay Tool. User guide. Software version Copyright European Patent Office All rights reserved

Smart-X Software Solutions SecReport Enterprise User Guide

Adobe Dreamweaver CS5 Tutorial

Tutorial 8 Sharing, Integrating and Analyzing Data

Impossible Solutions, Inc. JDF Ticket Creator & DP2 to Indigo scripts Reference Manual Rev

MAILMERGE WORD MESSAGES

IMPORTING A STUDENT LIST FROM SYNERGY INTO A GOOGLE CONTACT LIST

Microsoft Word - Starting the Mail Merge Wizard

Importing Career Standards Benchmark Scores

Moving Materials from Blackboard to Moodle

Programming Project 5: NYPD Motor Vehicle Collisions Analysis

CAF DONATE. Adding & amending Direct Debit donations. Processing your offline Direct Debit donations

A Brief Word About Your Exam

Objective 1: Familiarize yourself with basic database terms and definitions. Objective 2: Familiarize yourself with the Access environment.

Tutorial for downloading and analyzing data from the Atlantic Canada Opportunities Agency

HOW TO USE THE EXPORT FEATURE IN LCL

Getting Started with Eclipse/Java

Anchovy User Guide. Copyright Maxprograms

Part 2 Downloading and installing templates and dummy patient records into Medical Director

CRM CUSTOMER RELATIONSHIP MANAGEMENT

How to Import Customers & Suppliers into SortMyBooks Online Pt 1.

Tutorial for downloading and analyzing data from the Atlantic Canada Opportunities Agency

BatchDO 2.1 README 04/20/2012

Installing a Custom AutoCAD Toolbar (CUI interface)

Generating a Custom Bill of Materials

GRAPHIC #1. Open . Save

Text Conversion Process

2007, 2008 FileMaker, Inc. All rights reserved.

Installing and Configuring Xitron RIP Software and Ohio GT RIP Plug-In

Importing Geochemical Data

P6 Professional Reporting Guide Version 18

Series 6 Technical Admin Guide Page 1

Administering a Database System

ADOBE DREAMWEAVER CS4 BASICS

CheckBook Pro 2 Help

Visual Streamline FAQ

Introduction to the RedDot Content Management System

RegressItPC installation and test instructions 1

SedonaOffice Users Conference. San Francisco, CA January 21 24, Query Builders. Presented by: Matt Howe

NiceForm User Guide. English Edition. Rev Euro Plus d.o.o. & Niceware International LLC All rights reserved.

Quick Guide to American FactFinder

OneNote. Using OneNote on the Desktop. Starting screen. The OneNote interface the Ribbon

GUARDTOOL IMPORTER ADDENDUM

Title and Modify Page Properties

Electronic Committees (ecommittees) Frequently Asked Questions v1.0

USER GUIDE. MADCAP FLARE 2017 r3. Import

2008 TIPS and TRICKS LAW CONFERENCE

DiskSavvy Disk Space Analyzer. DiskSavvy DISK SPACE ANALYZER. User Manual. Version Dec Flexense Ltd.

Transport X. Synchronizer Application Manual. Version 1.1

SDL Content Porter 2013 User Manual. Content Management Technologies Division of SDL

Multi-Sponsor Environment. SAS Clinical Trial Data Transparency User Guide

Exactly User Guide. Contact information. GitHub repository. Download pages for application. Version

AMP User Manual Reports

December Copyright 2018 Open Systems Holdings Corp. All rights reserved.

GOOGLE SHEETS TUTORIAL

Getting Started With Estimating Extended

OBIEE. Oracle Business Intelligence Enterprise Edition. Rensselaer Business Intelligence Finance Author Training

Microsoft Excel 2007 Macros and VBA

Transcription:

File Inventory with DROID Updated January 2018 Tool Homepage: http://www.nationalarchives.gov.uk/information-management/manageinformation/policy-process/digital-continuity/file-profiling-tool-droid/ Introduction The Digital Record Object Identification (DROID) File Profile Tool was developed by The National Archives (of the UK) to perform automated identification of file formats. It uses digital signatures to identify the file format and version beyond what can be identified by the file extension alone. It uses information from the PRONOM database (also managed by the National Archives) found here: http://www.nationalarchives.gov.uk/pronom/default.aspx. DROID is free and open source under the New BSD License. There is an extensive user guide for the GUI version as well as the tools available via the command line. http://www.nationalarchives.gov.uk/documents/information-management/droiduser-guide.pdf Below is an example workflow of file inventorying at the American Geographical Society Library at UWM Libraries. Using DROID DROID is a java application. After downloading the tool, unzip the files into their final location, there is no installer. To run the tool, run the file called droid.bat (Mac and Linux versions will have a droid.sh). Detailed instructions for installation and set-up can be found in the user guide 1. When you run DROID, it will bring you directly to the main interface. A tab called Untitled-1 will appear. Before doing anything, it is important to set up your preferences. Any changes to the preferences will only take effect on profiles created after the changes are made. a. Go to Tools > Preferences. b. Ensure that Analyse the contents of archive files is checked if you wish to see files inside of.zip,.tar.gz, or.gzip files. c. If you are analyzing file fixity or scanning for duplicate files, generating a hash for each file can be useful. Click the check box and choose md5. A hash is a unique string of letters and numbers based on the actual contents of the file. If two files are identical, they will have identical hash values. If a file changes, so will its hash value. This value is useful for Fixity and for duplicate detection. However, it increases the time required for scanning significantly. 1

d. Under the Signature Updates tab, I recommend changing the update frequency to Every time DROID starts up, especially if you have file formats that are actively updated. 2. Once you re satisfied with your preferences, close any automatically created profile and click the green New button. This will create a new, untitled profile with the settings specified. 3. Next, add the directory or directories to be analyzed by clicking the Green Add button. Be sure to check Include sub-folders in the Select Resources form. In the example below, the Portage_Co directory is the target for profiling. 4. Before starting the scan, save your profile. Click the Save Button. Save the profile somewhere that you can access it. I recommend saving with a date in the filename so that you can compare new profiles to this old one in the future. Saving at this point allows for easily running the analysis again if anything goes wrong. 2

5. Click Start. This could take from 30 seconds to small folders to hours and days for very large folders. If running a large profile in the background, you can use the throttle slider at the bottom of the application window to slow down the scan and free up more resources. You will be able to start inspecting the profile during the scan. 6. When the scan is completed, I recommend saving immediately. This can take several minutes or longer for large profiles. 7. There are a few built-in tools to analyze the profile. First, you can filter the results by any of the fields included in the report. You can also use the Report tool to generate a report. a. Click the report button and then select the profile(s) you wish to report on. b. Under Select Report you have a few options, choose the most appropriate for your analysis. Reports can be as simple as file counts and sizes to comprehensive breakdowns of each individual file, file type, and more. c. You can view the report as a stylized document right in the application or click Export to export the report to XML, HTML, TXT, or PDF. 8. The most useful function in DROID is the ability to export the profile to a text file and then bring it into a database for further analysis. a. Click Export b. Select the profile(s) you wish to export. Make sure that One row per file is selected if you want a comprehensive list. If you re only interested in the file formats present, use the other option (You can select the default option for this in the preferences). Click Export Profiles c. I recommend saving the export in the same location you save the profile and again using a date in the file name so that you can compare to new reports in the future. By default, this file with not have a file extension, but I recommend adding.csv, because it is a text file of comma separated values. 3

Importing a DROID export into MS Access 1. Open Microsoft Access and make a new blank desktop database in your desired location. You can reuse the database when you make a new export in the future. 2. In the ribbon, under the External Data tab and in the Import & Link section, click Text File to add data from text. 3. In the form that opens, ensure that Import the source data into a new table in the current database is selected. Browse for your export CSV and click Open. 4. Click OK in the Get External Data form. 5. Choose Delimited and click Next 6. Check the box First Row Contains Field Names, ensure Comma is the selected delimiter, and that Text Qualifier is set to a double quotation. If the table is looking correct in the preview, click Next. 7. In this view, you are specifying data types for the fields. There are a couple very important things to change here. Use the horizontal scroll bar to find the ID column. Change the Data Type to Integer. Do the same for the PARENT ID and FORMAT COUNT fields. Do the same for the SIZE field, except choose Long Integer. Click Next. 8. This view asks about keys. In this case, our data has an ID field already called ID. Select Choose my own primary key and choose ID. Click Next. 9. The last view will have you name your table. I recommend including a date if you plan to make future DROID profiles and wish to compare them. 10. Click Finish and Save your database. If all went according to plan, you now have a table listing every file in the scanned directory. The possibilities for what we can do with this data are plentiful. 4

Analyzing a DROID export in MS Access: Duplicate Detection There is tons of possibility for analysis in Access, one example is duplicate detection. Duplicate detection can take advantage of some built-in tools in MS Access Data fields for analysis include an ID number (based on the location in the file tree), a Parent ID, File Path, URI (based on file path), the size in bytes, profiled file type information, extension, filename, etc. See the duplicate detection workflow document for information on duplicate detection using Duplicate Cleaner Free 4.0.5. Remember that if you add/move/remove files in your directory, IDs and Parent_IDs will change, so if you are looking for a persistent ID, the best bet is the URI field. It s a little long, but a file in the same place will have the same UID even if the file itself is changed or renamed (but not if it s moved). 1. Duplicate Detection a. On the ribbon, under create, click Query Wizard b. On the New Query form, choose Find Duplicates Query Wizard. Click OK. c. Select the table in which you would like to detect dupes. d. When the wizard asks Which fields might contain duplicate information? select MD5_HASH. If you don t have this field available, make a new profile paying special attention to step 1c in Using Droid. e. When the wizard asks about other fields you wan to include in the query, choose what you think will be useful. I normally use the following: ID, PARENT_ID, URI, NAME, SIZE, EXT f. If you want to save your query, give it a name. Otherwise, accept the default and click Finish. g. Some important notes about duplicate detection: i. Duplicates and originals or masters or whatever are all displayed. If you have three identical files, all three will be shown. ii. Duplicates identified may not share a file name. In the duplicate detection ran for this example, DROID found 6 files with identical hash and size, but differing names. Armed with this information, one can now identify which of these files should be preserved and which can be discarded. 5