Digibess: thanks Islandora! Arcidosso Italy March, 20-22, 2013 Giancarlo Birello, Anna Perin IT office and Library CNR-Ceris
BESS : group of 18 socioeconomic libraries in Piemonte (Italy) The libraries of the project share a common specialization but are different in terms of size, institution, purpose, as well as collections. Such differences, which could have been deemed as a weakness, have eventually turned out to be an asset. The coming together of such diverse agencies as private foundations, research institutes, and university libraries has freed and disseminated a capital of resources, know-how, and initiatives. General info 1 of 4
The most important initiative of BESS is the creation of a digital repository, of sources for the study of Piedmontese society and economy (digibess). The project is supported by Compagnia di San Paolo of Turin. The resulting repository provides as a stable and fundamental source of regional and economic information to the whole community. The repository contains also some interesting collections of partners such us FIAT Historical Archive, Lavazza Archive, Gramsci Foundation. Supported by Compagnia di San Paolo The laboratory is composed of: Automatic scanner Qidenus: high-speed book scanning, format max. A4 Planetary scanner Bookeye for large format 6 workstations to carry out all post- production steps A 15 terabytes NAS to store files The entire workflow involves the following steps: scanning Native file conversion to high resolution tiff Text file creation by Optical Character Recognition Metadata files preparation General info 2 of 4
CNR-Ceris IT Office and CNR-Ceris Library are commissioned to handle all the post-scan of the digitization. CNR-Ceris had to provide for the management of large volumes of data with the availability of space storage for the digitized works with characteristics of stability, versatility and dynamism. CNR-Ceris has deployed the software and server platforms of the repository, in a virtualized and redundant infrastructure. CNR-Ceris also take care of the design, development and management of the web portal (front-end) for the presentation, search and consulting data of digitalized items. Some characteristics: files: high resolution tiff, OCR txt file, pdf /a, metadata 14 TB disk base space at disposal 2-node active/passive open-source cluster High Availability Hypervisor using cluster storage Fedora Commons repository harvesting OAI-PMH Scripting for ingesting Islandora model and modules Drupal front-end Solr - search platform from the Apache Lucene project Full-text search General info 3 of 4
Total budget: 90% budget for digitalization (digital laboratory, staff for digitalization) 10% for repository budget for repository: 90% for hardware Thank you to: open source solutions many nights work (for system manager..) But YES HE DID IT! General info 4 of 4
Architecture 1 of 2 Architecture cluster, hypervisor, virtual machines, network(vlans, IPv6)
Architecture two servers solution: front-end/back-end, multiple front-end for back-end Architecture 2 of 2
Software open-source, linux, (DRBD, corosync, pacemaker), KVM, Fedora Commons and CLUSTER (DRBD COROSYNC, PACEMAKER, UBUNTU) HYPERVISOR (KVM, UBUNTU) REPOSITORY FEDORA, SOLR, APACHE, UBUNTU, TOMCAT FRONT-END APACHE, DRUPAL, UBUNTU.and? Software 1 of 6
Islandora as front-end flexible open clear code high activity Software 2 of 6
Software 3 of 6 Islandora a good guide for fedora commons models
Islandora book solution pack (it is our case: collections, books and pages) search models views Book solution pack Software 4 of 6
Islandora Image Viewer Viewer and read online With T text IIV Open-source NO flash player NO external services Software 5 of 6
Islandora has more useful features (for our project we use only presentation layer) ingesting object and collection management roles manager workflow Software 6 of 6
Project was a strict collaboration between librarian and system manager: librarian asking for feature and system administrator customizing code MODELs, SEARCH, VIEW, QUERY, YES! Yes! Yes! Yes! Digibess 1 of 1
Please: METADATA ok basic DC no MODS Digibess Model 1 of 6 MODS removed Done!
Digibess Model 2 of 6 Dublin Core metadata
Please: We need book INDEX Digibess Model 3 of 6 Datastream INDEX added to bookcmodel Done!
Digibess Model 4 of 6 TAB index to book view added
Please: We need collection info datastream INFO added to collectioncmodel Done! Digibess Model 5 of 6
TAB info to collection and book view added Digibess Model 6 of 6
Please: We need full-text search result with highlighting search result view modified Done! Digibess Search 1 of 4
Full-text search results DC search returns books, full-text search returns text, pages and books Digibess Search 2 of 4
Please: We need multiple words search search form and syntax modified Done! Digibess Search 3 of 4
Multiple words search example Digibess Search 4 of 4
Please: We need to enable download of pdf, tiff and txt book and page view modified Done! Digibess View 1 of 5
Book and page views with file links Digibess View 2 of 5
Please: Reverse order by dc.description for some collections (Periodicals) custom collection datastream QUERY added Done! Digibess View 3 of 5
Please: We need repository statistics new object with custum QUERY and QUERYCOLLECTION_VIEW Done! Digibess View 4 of 5
Digibess View 5 of 5 Statistics
Ingesting by scripts (from word, native pdf, jpeg, tiff) scripts Digibess Ingesting 1 of 1
OAIPROVIDER oai_dc and pico metadata with virtual datastream OAI-PMH Data Provider Repository exposes OAI-PMH interface for external metadata harvesting. Metadata can be disseminated in two formats: OAI_DC and PICO. OAI_DC are extracted from object datastream DC. PICO are generated on-the-fly by an object service. Digibess Harvesting 1 of 1
Development site: http://dev.digibess.it Step by step installation guide: http://www.digibess.it/fedora/repository/openbess:to094-00163/pdf/openbess_to094-00163.pdf
Mark tell me I did a great job with islandora and invited me to join islandora Camp Europe. Done! Please: I need Holidays... Thanks ISLANDORA! Anna: a.perin@ceris.cnr.it Giancarlo: g.birello@ceris.cnr.it