DIGITAL PRESERVATION FOR THE MASSES: Using Archivematica and DSpace as Solutions for Small-sized Institutions (and other options) Digital Commonwealth Annual Conference 2012 Joseph Fisher Database Management Librarian @ UMass Lowell Electronic Resources Digitization Projects

MBLC ILS grant to digitize the Paul E. Tsongas Congressional Papers Additionally included Lowell Historical Building Surveys Current proposal to digitize Tewksbury Almshouse records Digital Commons repository Digital Scholarly Services NSF data management planning Vice President Digital Commonwealth AGENDA Why Digital Preservation

For whom What it is How to approach it OAIS and TRAC Basic requirements Solutions DuraCloud LOCKSS DSpace Archivematica WHERE THIS INFORMATION ORIGINATES Graduate (2011) University of Arizona SIRLS Graduate Certificate Program in Digital Information

Management (DigIn) Digital Preservation Management Workshop: Implementing Short-term Strategies for Long-term Problems (attended 2004 (Cornell) and 2010 (ICPSR) @ MIT) SAA Digital Archives Specialist (DAS) program Nine workshops and exams required for DAS Certificate 24 workshops currently in four sections with 8 online WHY IS DIGITAL PRESERVATION IMPORTANT?? Obsolescence!! Bit Rot!!

NOT JUST FOR LIBRARIES & ARCHIVES ANYMORE Researchers coming soon to a government grant near you Data Management Planning Record Managers born digital tsunami People personal archiving Indeed, we are now all our own librarians.

Ellysa Stern Cahoy, Penn State University Libraries The Signal: Digital Preservation, Library of Congress blog, 4/9/2012 rchiving/ DIGITAL PRESERVATION: WHAT IS IT? The series of managed activities to ensure continued access to digital materials for as long as necessary. DCP Handbook. Digital Preservation Coalition (2008) Managed activities: defined very broadlyrefers to all of the

actions required to maintain access to digital materials beyond the limits of media failure or technological change. Access: continued, ongoing usability of a digital resource, retaining all qualities of authenticity, accuracy, and functionality deemed to be essential for the purposes the digital material was created and/or acquired for. [see significant properties] Authenticity: the trustworthiness of the electronic record as a record. that whatever is being cited is the same as it was when it was cited unless the accompanying metadata indicates any changes.

FIVE ORGANIZATIONAL STAGES 1. Acknowledge: Understanding that digital preservation is a local concern 2. Act: Initiating digital preservation projects 3. Consolidate: Segueing from projects to programs 4.

Institutionalize: Incorporating the larger environment and rationalizing programs 5. Externalize: Embracing inter-institutional collaboration and dependency. OAIS REFERENCE MODEL (OPEN ARCHIVAL INFORMATION SYSTEM) The Consultative Committee for Space Data Systems (CCSDS) released in 1999 SIP

Submission Information Package (Producer) Appraisal & Accession Validate & Verify Virus protection & Checksum file normalization (PDF/A) metadata description, preservation, structural AIP Archival Information Package (Management) Store digital object(s) and associated metadata Dublin Core, MODS, PREMIS, METS package migrate, error-check,

replace(Consumer) DIP Refresh, Dissemination Information Package Retrieval, delivery, and security Monitor Designated Community for changing needs WHAT IS THE OPEN ARCHIVAL INFORMATION SYSTEM? Its Open in the flexible sense of an outline, framework, or blueprint.

And an Information System in the sense of a comprehensive, integrated, and complex conceptual construct. ISO 14721:2003 a collection of six high-level services, or functional components, that, taken together, fulfill the OAISs dual role of preserving and providing access to the information in its custody. SIX CORE OAIS REQUIREMENTS

1. 2. 3. 4. 5. 6. Negotiate and accept appropriate information from Information Producers Obtain sufficient intellectual control of the information to ensure Long-term preservation Determine the scope of the Designated

Community Ensure the information is understandable by the Designated Community without the assistance of the information producers Follow clearly documented policies & procedures to ensure the information is preserved against all reasonable contingencies Make the information available to Designated Community TDR AND TRAC TRUSTWORTHY REPOSITORIES AUDIT & CERTIFICATION Categories: A. Organizational Infrastructure

B. Digital Object Management C. Governance, organizational structure, staffing & viability Procedural accountability & policy framework

Financial sustainability, contracts, licenses, & liabilities Ingest -- preservation strategies & processing procedures Workflows, documentation, records, & audit procedures Unique identifiers, metadata, & verification testing preservation planning & strategies Access policies & designated community interaction Technologies, Technical Infrastructure, & Security Software, updates, security Checksum error-checking Backups & disaster recovery

ISO 16363 The standard is titled the Trusted Digital Repository (TDR) Checklist Based upon the Trusted Digital Repositories and Audit Checklist (TRAC) CCSDS publication (Magenta Book) Sep. 2011 (The Consultative Committee for Space Data Systems)

ISO approved standard for publication in Mar. 2012 working group also wrote and submitted ISO 16919, entitled, Requirements for Bodies providing Audit and Certification BASIC REQUIREMENTS OF DIGITAL PRESERVATION The more copies the safer Replicate

data on multiple storage systems The more independent the copies the safer Save in different geological locations Save on different technology system types The more frequently the copies are audited by checksum error checking the safer Audit or scrub the replicas to detect damage, and repair by overwriting the bad copy with a good copy

David S. H. Rosenthal Bit Preservation: A Solved Problem. International Journal of Digital Curation. 1.5 (2010) SIP TO AIP Save and maintain at least one copy of file kept exactly as is in its original file format Convert Plan copy for public use to PDF or JPEG to migrate use copy as format changes

Normalize copy to preservation format if necessary Word doc to PDF/A1b Possibly migrate copy of Word doc as format changes Dublin Core descriptive record and maybe a MODS record also in XML

PREMIS METS record in XML preservation metadata record in XML structural metadata SO WHAT ARE SOME OPTIONS? DuraCloud LOCKSS Dspace Archivematica Began development 1991 (beta release 2001) Still managed out of Stanford Global LOCKSS hosted at Stanford

Private LOCKSS Networks (PLN) to preserve manuscript and image collections, data sets, etc. Example is MetaArchive Cooperative First year server purchase $4,600 $1 /GB/year + $5,500 or $3,00 annual membership 1 TB = $24,100 for 3 years for sustaining member Good example of a TRAC audit report (PDF available) At least 6 nodes (so 6 copies) Maintain storage server DSPACE

HP-MIT Libraries Alliance (2002) DuraSpace (2009) Current version 1.8.2 (24 Feb. 2012) Linux / Windows (Java)

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets. Beginning with 1.7 (Dec. 2010) began adding significant digital curation functionalities DSPACE DEVELOPMENT 1.7.0 released 17 Dec. 2010 Discovery

enables faceted searching AIP backup and restore Duracloud integration Export/import entire hierarchy, community, or collection Curation System (CS) Profile collection based on format type Check that required metadata fields are present Enhance/replace/normalize an items metadata or content Checksum checker

1.8.0 released 4 Nov 2011 Bulk metadata editing SWORD client push content to other SWORD repositories Rewrite Creative Commons license Virus checking during submission 3.0 projected Oct/Nov 2012 Version number scheme changing to 2 digits Major release increments 1st digit & bug fixes 2nd digit Item-level versioning features from Dryad Project

DSPACE INSTALLATION Prerequisite Software : Linux or Windows Oracle Java JDK Maven (Java build tool for stage 1) Ant (Java build tool for stage 2) PostgreSQL or Oracle Tomcat Perl ARCHIVEMATICA A free and open-source digital preservation system.

Uses a micro-services design pattern to provide an integrated suite of software tools that allows users to process digital objects from ingest to access in compliance with the ISO-OAIS functional model. Managed by Artefactual Systems (Toronto) in collaboration with the UNESCO Memory of the World's Subcommittee on Technology, the City of Vancouver Archives, the University of British Columbia Library, the Rockefeller Archive Center, Simon Fraser University Archives and Records Management, and a number of other collaborators.

ARCHIVEMATICA DEVELOPMENT 0.6 alpha release 19 May 2010 0.7 alpha release 18 Feb. 2011 0.8 alpha release 3 Feb 2012 Complete standards-compliant PREMIS in METS implementation

Multiple normalization options Ability to ingest DSpace exports Archivematica Appliance Installation in Oracle VM VirtualBox 1. Install Open Source VirtualBox DOWNLOAD ARCHIVEMATICA APPLIANCE FILE 1. 2. hivematica-0.8-alpha-vmdk.tbz Requires something like 7Zip to unpack to this tar file: archivematica-0.8-alpha-vmdk2.tar

3. Which you then unpack yet again to the appliance installation file: archivematica-0.8-alpha.vmdk Create New VM and Assign OS to Linux/Ubuntu Accept default Memory allocation Point to the Archivematica vmdk appliance file Additional recommended configurations outlined on Archivematica site Requires some knowledge of Linux command line List of MicroServices and Tools used by Archivematica

Receive SIP verifyChecksum Review EXT3, Thunar, incron, flock SIP extractPackage assignIdentifier parseManifest clean Filename

Quarantine SIP UUID, Detox, Easy Extract, ClamAV lockAccess virusCheck Appraise SIP FITS, JHove, DROID, NLNZ Extractor identifyFormat validateFormat

extractMetadata decidePreservationAction Prepare AIP FFident, Unoconv, Ffmpeg, OpenOffice gatherMetadata normalizeFiles createPackage Review AIP decideStorageAction Store

AIP ImageMagick, Inkscape, Xena Bagit, SAMBA, NFS-common, Poster writePackage replicatePackage auditfixity readPackage updatePackage Provide DIP uploadPackage updateMetadata

Monitor Preservation checkFormatRegistry ICA-AtoM, DCB Dashboard Live demo of Exercise One in this Archivematica Tutorial: https:// 05/Tutorial-08.pdf Another good introductory tutorial is a YouTube video available on the home page of the Archivematica Wiki:

RECOMMENDATIONS: Library of Congress Digital Preservation Outreach & Education (DPOE) DPOE Webinars: Intro to Digital Preservation 1-3 by Jody DeRidder DCC Curation Lifecycle Model: How to use the Curation Lifecycle Model

Recently Viewed Presentations

  • Variation and Gradience in Phonological Theory

    Variation and Gradience in Phonological Theory

    T-Orders and Variation Arto Anttila Stanford University Workshop on Variation, Gradience and Frequency in Phonology July 8, 2007 3-syllables, t-deletion 3-syllables, no t-deletion 4-syllables 4-syllables, no t-deletion 4-syllables, t-deletion 5-syllables (partial graph) 5-syllables (partial graph) Compound stress effects 1.
  • Welcome Grads! And so it begins Post-Secondary Information

    Welcome Grads! And so it begins Post-Secondary Information

    $150 for 3 Ontario University Program Choices Each addition program choice is $50 No limit to number of program choices except $$$ Ontario University Admission Requirements OSSD 6 Grade 12 U/M courses Pre-requisite Courses Additional Criteria Ontario University Admission Requirements...
  • Phylum Arthropoda - PBworks

    Phylum Arthropoda - PBworks

    Phylum Arthropoda Crayfish, Lobsters, Spiders, Mites, Scorpions, & Insects * * * * * * * Fig. 18.2a * Fig. 18.2b * * * * * * * * * * * * * Characteristics of the Phylum Arthropoda Largest...
  • Interchanges - SUNY Polytechnic Institute

    Interchanges - SUNY Polytechnic Institute

    Velocities designed for 3-10 fps. Sanitary Sewer System. Laterals collect wastewaters to branch lines which are conveyed to large mains (also called trunks or outfall sewer) ... 310 gpm / 400 gpcd (and adjust for units) =1100 people (approx. 275...
  • The Enlightenment and Revolution 1550 - 1789

    The Enlightenment and Revolution 1550 - 1789

    Renaissance and Reformation. After the Black Death swept Europe around 1350 CE (about 1/3rd died) thinking about spiritual matters began to change as a wave of creativity revitalizes Italy and Europe. The Renaissance (rebirth of civilization) challenged old ideas about...
  • Women's Suffrage - Northern Highlands Regional High School

    Women's Suffrage - Northern Highlands Regional High School

    Case dropped. Quotes on Reform ... He encouraged her to start fighting for women's suffrage. Catt began working for the National American Women's suffrage Association. 1892: Susan B Anthony asked Catt to propose suffrage amendment in front of congress.
  • In which one of the following is the oxidation number of the ...

    In which one of the following is the oxidation number of the ...

    In which one of the following is the oxidation number of the transition metal incorrect? Complex Species / Oxidation Number of Transition Metal [Co(en)(NH3)2(OH)2]Cl / +3 ... tetradentate 6 - hexadentate Which ligand formula is incorrectly matched with its name...
  • Biology and Crime  Early Theories of Biology and

    Biology and Crime Early Theories of Biology and

    Somatotype Theory (3 of 3) Explanations Those with muscular builds tend to enjoy the physical activity involved in crime. Mesomorphic body type may have an advantage in the rough‑and‑tumble activities of street crime. Mesomorph is perceived as a threat and...