Trials and Tribulations: Archiving Electronic Records Adam Jansen
Trials and Tribulations: Archiving Electronic Records Adam Jansen Digital Archivist Washington State Archives Records and Information or, Why we do what we do If - Information is power And - Records are storage of information Then Records must be preserved for
future generations Shifting Media Historically records were stored on paper, kept in filing cabinets When the cabinet was full, records sent to file room Now records stored electronically on computers When the computer is full add more hard drives Basic skills to manage and maintain records
has been lost, replaced by infinite storage Higher Standards As electronic records become more integrated into society, producers of those records will be held to higher standards of conduct HIPPA SOx Federal and State Mandates Case Law
WA Public Records Laws As defined in RCW 40.14 ANY records that have been made by or received by any agency of the state of Washington in connection with the transaction of public business Records Retention Any destruction of official public records shall be pursuant to a schedule approved
under RCW 40.14 Why?... The foundation of democracy in America is government accountability to the people So the question becomes who takes care of the records, and do they have the knowledge?
Caretakers of Information Historically records sent to file room, staff maintained access to records and managed lifecycle based on need and legal requirements Now records are managed by users and IT staff, based on capacity and cost Neither trained in the science of information management Why a Digital Archives?
Comply with statutory & regulatory mandates. The Law requires preservation of certain public records it doesnt specify whether those records are paper or electronic. All records must be given the same care. Avoid loss of legal & historical records As technology changes, the older media (5 floppy disks, for instance)
become harder to read. Centralize Records Centralization means uniformity in maintenance Trained professionals serve as caretakers Preserve rare and at-risk paper records
Improved access for citizens By centralizing historical electronic records in one location, one-stop shopping will provide the information quicker and easier What the Digital Archives is not Not mass storage for active business applications & data Not remote back-up for state & local government networks & data The Digital Archives will:
Preserve electronic records with long-term legal, historical and/or fiscal significance Assure platform-neutral retrieval 50, 100, or more years from now Provide security back-up of certain permanent electronic legal records (courts, vital records, land records, etc.) Project History 2001 Session Legislative approval (SSB 6155, 2001-2003 Capital Budget)
January September 2002 Building Programming January 2003 Building construction begins September 2003 ISB technology review October 2004- Grand Opening Q4 2006 Full implementation Monies In and Out Primary funding source - $1 surcharge Expenditures $14.5M joint use facility $1.5M technology acquisition
$950,000 Software Development Ongoing budget of $2.1M/year Requirements to E-Archive Hardware Software
Management Authenticity Hardware File Room of the 21st century Capacity and Speed double every 18 months Many choices Tape Optical Spinning Disc
First Immutable Law of Digital Archiving What hardware you use today will be obsolete within four years Washington State Digital Archives Network Configuration, May 2, 2005 HP DL380 2 * 3GHz HT CPU 2GB RAM 36GB Mirrored HD MS WIN 2003 std 2 Coyote HW Loadbalancers HP DL380
2 *3GHz HT CPU 2GB RAM 36GB Mirrored HD WIN 2003 std DA-DC1 HP DL580 4 *3GHZ CPU 4GB RAM 36GB Mirrored HD MS WIN 2003 ent
DA-DMZ-DC2 HP DL580 4 *3GHZ CPU 4GB RAM 36GB Mirrored HD MS WIN 2003 ent MS SQL Server 2000 MS Clustering Active/ Active
Firewall Database Server HP DL380 2 *3GHz HT CPU 2GB RAM 36GB Mirrored HD MS WIN 2003 std HP DL380
2 * 3GHZ HT CPU 2GB RAM 144GB RAID 5 HD MS WIN 2003 std MS BizTalk 2004 ent Web/FTP Server Web Services BizTalk Server Administration Shared Disk Array
Digital Archives Hardware Network Cisco Backbone end to end LAN and SAN EMC SAN storage 5 TB now, 20TB by end of Year HP Servers and desktops ADIC Tape Library for offsite, disaster recovery
Microsoft Software and Development w/ EDS Archival Software Formats Native
ASCII TIF PDF/A XML Whenever possible seek the Open, documented solution! Remember WordStar and DBase II ??? File Formats Digital Archives Multi-pronged approach: Stored as BLOBs in DB with metadata:
Maintain native format, wrapped Create open file format version Render XML formatted version, wrapped Acquire original hardware and software Content Management Essential to maintain control of the information explosion Allows hard coded rules and information exchange BUT still requires a strong knowledge,
understanding and implementation of basic records management Second Immutable Law of Digital Archiving: Data is Data, a Record is a Record, It is the content that drives retention, not the media Content Management Not true CM but rather archival storage and retrieval DoD 5015.2-STD compliant system Wrap original file in native format
Wrap XML copy Apply metadata & XML for indexing, searching & retrieval Provide chain of custody & authenticity Content Management
Microsoft Solution Custom Coded .Net front end SQL Server back end BizTalk translation utility SSH Tectia for secure transport Authenticity Maintain Chain of Custody In the care of trusted 3rd party Received from trusted, known source
Data Security Encrypted SSH FTP transmission
Issue Digital Certificate Verify IP and computer information MD5 Hash on all original files Copy of FTP on tape prior to ingestion DB backups on tape Record Level Security for confidential Info FTP Fingerprint FTPUpload Date="8/23/2005 9:13:05 AM" NTUserName="temp" Domain="CRISPLUS" SFTPUserName="FranklinCoAuditor" HostInformation
Primary WINS Server . . . . . . . : 172.30.7.2 Secondary WINS Server . . . . . . : 18.104.22.168 Record Level Security Restrict records at item, field or series level Restrict to individual, dept, office or global Uses authenticated login to reveal fields Anonymous users see Restricted Open Record
Restricted Record Confidential MOU Ingestion Process MUST be flexible No Mandate and 3300 agencies
Microsoft BizTalk 2004 Transforms, adds metadata based on business rules Creates deep storage copy wrapping original file in XML, with Hash Creates web version of original file BizTalk 2004 fname firstname
Fst_name first Jun-07-05 07-Jun-05 06/07/05 06/07/2005 First_Name BizTalk Predefined Pipelines
06/07/2005 Deep Storage XML Schema Record Common Who Vital Records What
Type When Where Original File web file Security Fixity Birth
Date of Father, Mother Hospital Deep Storage XML Archive Database Designed around latest industry standards Open source, non-proprietary file storage Applies metadata tags to save information about record
creator, date, agency, subject, etc. Provides chain of custody & authenticity of record Allow search and retrieval of archival records through a web page Web Design Wire Frame www.digitalarchives.wa.gov Admin Pages
Requires authenticated log-in Allows viewing of confidential information E-Transmittal process Viewing of open orders Whos Visiting???
Avg over 300 visits per day Avg length of stay 9 minutes 6% .gov - 4% .edu - 1% .org 13% came from Internet Search (Google, MSN, Yahoo)
Visitors from: Canada, US Military, Romania, Germany, France, Australia, Japan, UK, Netherlands, Russia, Thailand, Portugal, Belgium, Poland, Italy, Indonesia, Singapore, Sweden, Mexico, New Zealand, Czech Republic, Hungary, Brazil, Norway, Columbia, Austria, Greece, Bulgaria, China, Yugoslavia, Philippines, Spain, South Korea, Denmark, Oman, Pakistan, South Africa, Jamaica, Switzerland Risks Distributed, non-standardized environment No mandate to use Digital Archives
Limited technology expertise in some agencies Unpredictable data growth rate Few business models Emerging technologies Limited internal expertise Management Issues
Authenticity of record Metadata File naming conventions Corporate Culture
Start small with e-mail, web page Use existing retention schedules Educate Shift AWAY from desktops Management Software is a must! Privacy of sensitive data Third Immutable Law Anything that you do today, will need major overhaul in two years Technology and industry changing at
unprecedented rates But, more records are lost every day! Key is to be flexible and attack with forethought Digital Archives Eastern Washington University, Cheney, Washington Adam Jansen Digital Archivist [email protected]
Secure FTP Custom FTP Configuration
Uses SSH Tectia client 128 Bit Encryption Ease of use Minimal user interaction/Intervention Simple notification XML log file output Digital Footprint Right Click Send to
Drag and Drop Double Click Send Notifications
Minimal Notification Minimal User interaction Ease of understand of Notification Quick notification of errors. Ease of Cleanup of sent files. No Data Error Duplicates Possible Errors
Completion Delete E-Commerce Add to Shopping Cart Ecommerce Functionality Add to Shopping cart Shopping Cart
Shipping Info Billing Information View and Submit Order Confirmation Order Request
A Kingdom Emerges. Rome was ruled by many kings since 8th century (701 B.C.) Last king was Tarquin the Proud. Ruled harshly (cruel) 510 B.C. Romans remove Tarquin from power. Government is based on the will of the people, or...
Consensus Number Theorem. Consensus numbers are a useful way of measuring synchronization power. An alternative formulation: If X has consensus number . c. And Y has consensus number . d < c. Then there is no way to construct await-free...
Definitions. authorised officer - s a labour officer, employment officer or medical officer appointed under Labour Institutions Act (LIA). collective agreement - a registered agreement concerning any terms and conditions of employment made in writing between a trade union and...
Loyalty and Betrayal. The attempt on King James I and VI's life made this play very topical. ... Macbeth - a prophecy spurs him to evil deeds to try to become king. Lady Macbeth - the brains of the operation....
Which variable is used for comparison in an experiment? The "e" in "Mrs Gren" stands for_____ The interaction between all the biotic and abiotic factors in a region is called The total of all ecosystems on Earth is the_____ Is...
Human modified estuaries . Estuaries around Australia have been studied and classified according to their environmental conditions, taking into account the degree to which they have been modified or the impact of changes since European settlement. They can be loosely...
We Do all the time, Planning we do more or less, usually less and for Check and Act we don't have time. Many people think they know the Deming cycle, but let's see how it really starts working for us....
Ready to download the document? Go ahead and hit continue!