Trials and Tribulations: Archiving Electronic Records Adam Jansen

Trials and Tribulations: Archiving Electronic Records Adam Jansen

Trials and Tribulations: Archiving Electronic Records Adam Jansen Digital Archivist Washington State Archives Records and Information or, Why we do what we do If - Information is power And - Records are storage of information Then Records must be preserved for

future generations Shifting Media Historically records were stored on paper, kept in filing cabinets When the cabinet was full, records sent to file room Now records stored electronically on computers When the computer is full add more hard drives Basic skills to manage and maintain records

has been lost, replaced by infinite storage Higher Standards As electronic records become more integrated into society, producers of those records will be held to higher standards of conduct HIPPA SOx Federal and State Mandates Case Law

WA Public Records Laws As defined in RCW 40.14 ANY records that have been made by or received by any agency of the state of Washington in connection with the transaction of public business Records Retention Any destruction of official public records shall be pursuant to a schedule approved

under RCW 40.14 Why?... The foundation of democracy in America is government accountability to the people So the question becomes who takes care of the records, and do they have the knowledge?

Caretakers of Information Historically records sent to file room, staff maintained access to records and managed lifecycle based on need and legal requirements Now records are managed by users and IT staff, based on capacity and cost Neither trained in the science of information management Why a Digital Archives?

Comply with statutory & regulatory mandates. The Law requires preservation of certain public records it doesnt specify whether those records are paper or electronic. All records must be given the same care. Avoid loss of legal & historical records As technology changes, the older media (5 floppy disks, for instance)

become harder to read. Centralize Records Centralization means uniformity in maintenance Trained professionals serve as caretakers Preserve rare and at-risk paper records

Improved access for citizens By centralizing historical electronic records in one location, one-stop shopping will provide the information quicker and easier What the Digital Archives is not Not mass storage for active business applications & data Not remote back-up for state & local government networks & data The Digital Archives will:

Preserve electronic records with long-term legal, historical and/or fiscal significance Assure platform-neutral retrieval 50, 100, or more years from now Provide security back-up of certain permanent electronic legal records (courts, vital records, land records, etc.) Project History 2001 Session Legislative approval (SSB 6155, 2001-2003 Capital Budget)

January September 2002 Building Programming January 2003 Building construction begins September 2003 ISB technology review October 2004- Grand Opening Q4 2006 Full implementation Monies In and Out Primary funding source - $1 surcharge Expenditures $14.5M joint use facility $1.5M technology acquisition

$950,000 Software Development Ongoing budget of $2.1M/year Requirements to E-Archive Hardware Software

Management Authenticity Hardware File Room of the 21st century Capacity and Speed double every 18 months Many choices Tape Optical Spinning Disc

First Immutable Law of Digital Archiving What hardware you use today will be obsolete within four years Washington State Digital Archives Network Configuration, May 2, 2005 HP DL380 2 * 3GHz HT CPU 2GB RAM 36GB Mirrored HD MS WIN 2003 std 2 Coyote HW Loadbalancers HP DL380

2 *3GHz HT CPU 2GB RAM 36GB Mirrored HD WIN 2003 std DA-DC1 HP DL580 4 *3GHZ CPU 4GB RAM 36GB Mirrored HD MS WIN 2003 ent


DA-SE2 DA-SE3 Domain Controllers 80 Services Tier (Search Services)

IIS DMZ Citizen Internet User EMC Clariion CX700 SAN 1TB 15K FC 4TB 7200 SATA

IIS Hardware Load Balanced ` HP DL740 8 *3GHZ HT CPU 8GB RAM 36 GB Mirrored HD

MS WIN 2003 ent MS SQL Server 2000 MS Clustering Active/Passive IIS Digital Archives Asset Metadata Cluster DA-SE5

Tape Library DA-SE4 SAN Storage 80 80 DA-WEB1

Hardware Load Balanced http/ https IIS DA-BIZ-RS1 BizTalk Receive/Send Location

80/443 Internet Secure FTP ADIC iScalar 2000 10 LTO-2 drives 500 tape slots

22 DA-WEB2 http/https DA-BIZ-RS2 BizTalk Receive/Send Location BizTalk 2004

Database Cluster Data Tier 22 DA-BIZ-INBOX1 RAW Data Temp Storage Image Conversion XML Temp Storage DA-Tectia1

(Secure FTP) State/Local Office HP DL380 2 * 3GHZ HT CPU 2GB RAM 36GB Mirrored HD MS WIN 2003 std

DA-DMZ-DC1 Legend Processing Tier DA-Media1 & 2 (Images & Streaming Media) Internet Send/Receive

DA-DMZ-DC2 HP DL580 4 *3GHZ CPU 4GB RAM 36GB Mirrored HD MS WIN 2003 ent MS SQL Server 2000 MS Clustering Active/ Active

Firewall Database Server HP DL380 2 *3GHz HT CPU 2GB RAM 36GB Mirrored HD MS WIN 2003 std HP DL380

2 * 3GHZ HT CPU 2GB RAM 144GB RAID 5 HD MS WIN 2003 std MS BizTalk 2004 ent Web/FTP Server Web Services BizTalk Server Administration Shared Disk Array

Digital Archives Hardware Network Cisco Backbone end to end LAN and SAN EMC SAN storage 5 TB now, 20TB by end of Year HP Servers and desktops ADIC Tape Library for offsite, disaster recovery

Microsoft Software and Development w/ EDS Archival Software Formats Native

ASCII TIF PDF/A XML Whenever possible seek the Open, documented solution! Remember WordStar and DBase II ??? File Formats Digital Archives Multi-pronged approach: Stored as BLOBs in DB with metadata:

Maintain native format, wrapped Create open file format version Render XML formatted version, wrapped Acquire original hardware and software Content Management Essential to maintain control of the information explosion Allows hard coded rules and information exchange BUT still requires a strong knowledge,

understanding and implementation of basic records management Second Immutable Law of Digital Archiving: Data is Data, a Record is a Record, It is the content that drives retention, not the media Content Management Not true CM but rather archival storage and retrieval DoD 5015.2-STD compliant system Wrap original file in native format

Wrap XML copy Apply metadata & XML for indexing, searching & retrieval Provide chain of custody & authenticity Content Management

Microsoft Solution Custom Coded .Net front end SQL Server back end BizTalk translation utility SSH Tectia for secure transport Authenticity Maintain Chain of Custody In the care of trusted 3rd party Received from trusted, known source

Data Security Encrypted SSH FTP transmission

Issue Digital Certificate Verify IP and computer information MD5 Hash on all original files Copy of FTP on tape prior to ingestion DB backups on tape Record Level Security for confidential Info FTP Fingerprint FTPUpload Date="8/23/2005 9:13:05 AM" NTUserName="temp" Domain="CRISPLUS" SFTPUserName="FranklinCoAuditor" HostInformation

WindowsVersion="Microsoft Windows NT 5.0.2195.0" CPU ID="x86 Family 15 Model 2 Stepping 9, GenuineIntel" Level="15" Local Area Connection: Connection-specific DNS Suffix . : Description . . . . . . . . . . . : Intel(R) PRO/100 VE Physical Address. . . . . . . . . : 00-0D-60-3C-22-34 DHCP Enabled. . . . : Yes Autoconfiguration Enabled . . . . : Yes IP Address. . . . . . . . . . . . : Subnet Mask . . . . . . . . . . . : DNS Servers . . . . . . . . . . . :,

Primary WINS Server . . . . . . . : Secondary WINS Server . . . . . . : Record Level Security Restrict records at item, field or series level Restrict to individual, dept, office or global Uses authenticated login to reveal fields Anonymous users see Restricted Open Record

Restricted Record Confidential MOU Ingestion Process MUST be flexible No Mandate and 3300 agencies

Microsoft BizTalk 2004 Transforms, adds metadata based on business rules Creates deep storage copy wrapping original file in XML, with Hash Creates web version of original file BizTalk 2004 fname firstname

Fst_name first Jun-07-05 07-Jun-05 06/07/05 06/07/2005 First_Name BizTalk Predefined Pipelines

06/07/2005 Deep Storage XML Schema Record Common Who Vital Records What

Type When Where Original File web file Security Fixity Birth

Date of Father, Mother Hospital Deep Storage XML Archive Database Designed around latest industry standards Open source, non-proprietary file storage Applies metadata tags to save information about record

creator, date, agency, subject, etc. Provides chain of custody & authenticity of record Allow search and retrieval of archival records through a web page Web Design Wire Frame Admin Pages

Requires authenticated log-in Allows viewing of confidential information E-Transmittal process Viewing of open orders Whos Visiting???

Avg over 300 visits per day Avg length of stay 9 minutes 6% .gov - 4% .edu - 1% .org 13% came from Internet Search (Google, MSN, Yahoo)

Visitors from: Canada, US Military, Romania, Germany, France, Australia, Japan, UK, Netherlands, Russia, Thailand, Portugal, Belgium, Poland, Italy, Indonesia, Singapore, Sweden, Mexico, New Zealand, Czech Republic, Hungary, Brazil, Norway, Columbia, Austria, Greece, Bulgaria, China, Yugoslavia, Philippines, Spain, South Korea, Denmark, Oman, Pakistan, South Africa, Jamaica, Switzerland Risks Distributed, non-standardized environment No mandate to use Digital Archives

Limited technology expertise in some agencies Unpredictable data growth rate Few business models Emerging technologies Limited internal expertise Management Issues

Authenticity of record Metadata File naming conventions Corporate Culture

Start small with e-mail, web page Use existing retention schedules Educate Shift AWAY from desktops Management Software is a must! Privacy of sensitive data Third Immutable Law Anything that you do today, will need major overhaul in two years Technology and industry changing at

unprecedented rates But, more records are lost every day! Key is to be flexible and attack with forethought Digital Archives Eastern Washington University, Cheney, Washington Adam Jansen Digital Archivist [email protected]

Secure FTP Custom FTP Configuration

Uses SSH Tectia client 128 Bit Encryption Ease of use Minimal user interaction/Intervention Simple notification XML log file output Digital Footprint Right Click Send to

Drag and Drop Double Click Send Notifications

Minimal Notification Minimal User interaction Ease of understand of Notification Quick notification of errors. Ease of Cleanup of sent files. No Data Error Duplicates Possible Errors

Completion Delete E-Commerce Add to Shopping Cart Ecommerce Functionality Add to Shopping cart Shopping Cart

Shipping Info Billing Information View and Submit Order Confirmation Order Request

Recently Viewed Presentations

  • The Rise and Expansion of Rome - Redlands Unified School District

    The Rise and Expansion of Rome - Redlands Unified School District

    A Kingdom Emerges. Rome was ruled by many kings since 8th century (701 B.C.) Last king was Tarquin the Proud. Ruled harshly (cruel) 510 B.C. Romans remove Tarquin from power. Government is based on the will of the people, or...
  • ASN Lecture - DISCO

    ASN Lecture - DISCO

    Consensus Number Theorem. Consensus numbers are a useful way of measuring synchronization power. An alternative formulation: If X has consensus number . c. And Y has consensus number . d < c. Then there is no way to construct await-free...
  • Contracts


    Definitions. authorised officer - s a labour officer, employment officer or medical officer appointed under Labour Institutions Act (LIA). collective agreement - a registered agreement concerning any terms and conditions of employment made in writing between a trade union and...
  • Macbeth -

    Macbeth -

    Loyalty and Betrayal. The attempt on King James I and VI's life made this play very topical. ... Macbeth - a prophecy spurs him to evil deeds to try to become king. Lady Macbeth - the brains of the operation....
  • Ecology Unit - The Living Environment

    Ecology Unit - The Living Environment

    Which variable is used for comparison in an experiment? The "e" in "Mrs Gren" stands for_____ The interaction between all the biotic and abiotic factors in a region is called The total of all ecosystems on Earth is the_____ Is...
  • Impact of human activity on coasts

    Impact of human activity on coasts

    Human modified estuaries . Estuaries around Australia have been studied and classified according to their environmental conditions, taking into account the degree to which they have been modified or the impact of changes since European settlement. They can be loosely...
  • Presentation to CNRB Meeting #583 Turkey Point Engineering ...

    Presentation to CNRB Meeting #583 Turkey Point Engineering ...

    Pump nozzle loads required significant piping rework (vendor couldn't accept piping designer's higher values) Pump oil leaks due to thrust disc drive nut - wrench flats machined in wrong location Supplied Pump/Motor coupling keys too large, didn't match half keys...
  • Voorspelbare Projecten -

    Voorspelbare Projecten -

    We Do all the time, Planning we do more or less, usually less and for Check and Act we don't have time. Many people think they know the Deming cycle, but let's see how it really starts working for us....