The Preservation of Digital Archives

Introduction

The volume of records created in a digital format is growing. Such records can take a number of forms, including websites, emails, word documents, blogs and spreadsheets. The records we keep digitally, however, are more vulnerable than paper records, largely because technologies are evolving at such a fast pace. The lifespan of a digital record can be as little as five years. Hardware and software falls out of use, while the media on which information is stored deteriorates. Essential information can therefore easily become unreadable and unrecoverable.

We therefore need to ensure that digital records are managed actively and appropriately in order to ensure their long term accessibility. These guidelines are intended for individuals and small organisations who hold records in a digital format, offering practical advice on taking the steps necessary to ensure that these digital records are managed suitably.

Using these guidelines to manage your digital records appropriately will also help you to identify those records which might be worth preserving permanently as archives. If you subsequently offer these to your local archive service the records should be organised and in a format which facilitates their long term preservation for the benefit of future generations.

The guidelines cover the following areas:

  1. Organising and naming files
  2. Email
  3. Software media, file formats and changing technology
  4. Housekeeping
  5. Passwords and encryption
  6. Intellectual property rights and privacy
  7. Handling legacy digital files
  8. Websites and social media

1. Organising and naming files

1.1 Structure your folders and files in some sort of logical order. This will make it easier to find the file you need.

1.2 If more than one person is creating and saving files, ideally you should save them to the same place and not have separate filing systems. If you do not have one already, can you set up a network?

1.3 Standardise how you name files and folders and avoid using acronyms or abbreviations which may only be meaningful to the person who named them – avoid ‘HS’ for Health and Safety’, ‘TM’ for Team Meetings or ‘Steve’s files’, for example.

1.4 When creating new versions of existing documents, make sure that you know which the most recent version is. Use the file name (filenamev1, filenamev2, for example) to help you.

1.5 Get into the habit of adding information to the body of your documents which helps to identify them – for example, a table at the end of each file giving the date created, title, purpose, author, version etc. This will help explain the document to others and will help you when you re-discover the document at some later date.

1.6 Ensure that someone oversees the organisation of your files and makes sure that everyone is familiar with the conventions and procedures you establish and understands why they are important.

1.7 Any conventions and filing systems you have must be helpful and relevant. Review and update what you do regularly.

2. Email

2.1 Manage your emails effectively by deleting those that have no long term value and organising into subject folders with relevant titles.

2.2 Manage your inbox so that it contains only those messages to which you have not yet responded, or which relate to current business. File or delete the rest as soon as possible. Remember to also manage your sent messages effectively.

2.3 Select your email client and web mail service carefully. There are two main protocols used by email clients accessing email, namely POP3 (Post Office Protocol) or IMAP (Internet Message Access Protocol):

  • A POP3 client usually downloads email from the server to your PC and then deletes the email from the server. This means that email messages are more risk of loss so you may wish to consider making back ups;
  • An IMAP client accesses emails on the email server and normally leaves them there; this means that you usually stay online whilst reading, composing and sending mail.

2.4 Try to avoid using proprietary database format for storage.

3. Software media, file formats and changing technology

3.1 Media

  • Use high quality media and devices.
  • Handle media with care.
  • Keep access devices well maintained and clean.
  • Do not place labels on optical disks and/or mark using a pen or pencil – follow manufacturer's advice on labeling.
  • Store magnetic tape away from strong magnetic fields

3.2 File formats

  • Minimise the number of file formats you use as far as possible.
  • Use "open" non-proprietary, well documented file formats wherever possible, for example OASIS Open Document Format (ODF) for word processed documents, MySQL, PostgreSQL and Firebird for databases.  PDF is a file format widely used for presentation copies of office documents which cannot be edited by those viewing the file. PDF/A is a format which has been confined to basic PDF features to simplify its long term preservation.
  • Alternatively use file formats which are well developed and have been widely adopted e.g. Microsoft Office Suite.
  • Create digital images as high quality master images (min. 300dpi) in TIFF format, which is a well-supported open standard. If sending pictures via email, or adding them to a website, create lesser quality, but more easily transportable 'throwaway' versions in JPEG format.

3.3 Websites

  • Comply with W3C recommendations and make sure your HTML, CSS etc. is valid. Select open standard formats for images, audio and video, etc. wherever possible

3.4 Technology watch

  • Be as aware as possible of changing software and hardware technologies and the implications for your electronic records
  • Migrate electronic records to more up to date file formats and media wherever possible

4. Housekeeping

4.1 Try and keep the PCs, CDs, DVDs and hard drives you use as up-to-date as possible. All such hardware and storage media will eventually fail, but you can maximise their working life by keeping them clean and in an environment which is not subject to extremes of temperature and humidity.

4.2 Install anti-virus software and a firewall. There are many options, some of them free. Scan for viruses regularly.

4.3 Decide which files you would find it hard to function effectively without. If you do not already have a means of backing up these files then find one and use it regularly. This might also apply to files that are difficult or expensive to recreate (images or a website, for example). Keep a record of which files need to be backed-up and where the back-up is. Keep your back-up files in a location separate to your PC/laptop/office. Options include

  • Copying your files to magnetic tape. This is particularly useful when dealing with large amounts of data
  • Copying your files to an external hard drive
  • Buying online file storage. There are many providers of this type of this service
  • Copying your files to CD/DVD. The advantage of this is that it is relatively cheap and CDs and DVDs are familiar to most people. You will almost certainly have to split your files across a number of CDs/DVDs, however, and both CDs and DVDs degrade over time or become unreadable as hardware develops.

4.4 Delete old versions of files, working documents, notes, etc, unless you still need to refer to them.

5. Passwords and Encryption

5.1 Password and encryption can ensure that data is kept safe and secure.

5.2 It is best to choose open-source encryption to ensure continued availability

5.3 Remember passwords! Otherwise your data will become inaccessible.

5.4 Password managers are available online and enable you to securely manage passwords.

5.5 When passing digital files on to a successor remember to also provide any password and encryption information.

5.6 When depositing digital records with an archive service remember to remove any passwords or encryption.

6. Intellectual property rights and privacy

The ease with which digital records can be created, accessed, modified and published to the world potentially increases the risks of infringing copyright and Data Protection legislation.  Your records may contain material – reports, images, music etc – created by others which you need to handle carefully: copying, editing or forwarding copyright material could be a breach of copyright law.

6.1 Try to record details of the copyright holders of such material – you could, for example, do this in the file properties.

Your records may also contain information about individuals.

6.2 Personal archives - although the Data Protection Act does not apply to personal records while they remain in the creator's possession, it comes into force once custody is transferred to a public repository.

6.3 Organisational archives – all organisations which process personal data must manage their electronic records in a way which ensures it conforms to the Data Protection Act 1998.

7. Handling legacy digital files

7.1 In some cases you may receive digital files from a predecessor that are stored in older hardware, unfamiliar formats or software

7.2 Some older media can be fragile so be careful about trying to read or copy data.

7.3 If you are having difficulty reading the media or hardware then seek advice on copying to new media.

7.4 Undertake a survey of what you hold. Identify and delete duplicates or low value material. Some software and command prompts can help you identify duplicates, files structures and file names; such information may remove the need to read each file.

7.5 You can use free software such as DROID and PRONOM to help you identify file formats and the risks of obsolescence. Seek advice on migrating to a new and appropriate format.

8. Websites and social media

8.1 The 'fluid' nature of the web means that web pages or entire websites frequently change or disappear, often without leaving any trace. You may wish to consider archiving your web site or parts of your website.

8.2 The British Library and the UK Web Archive are currently ‘harvesting’ various websites as part of their web-archiving project.

8.3 The Internet Archive collection features archived version of websites. This organisation, which was founded in 1996 and is based in California, has managed to acquire 10 billion web pages and has collected local government websites. You may wish to see if they have already archived your website.

8.4 The Internet Archive's subscription service, Archive-It, allows institutions to build and preserve their own web archive, through a user-friendly web application, without requiring any technical expertise or hosting facilities. Subscribers can harvest, catalogue, and archive their collections, and then search and browse the collections when complete. Collections are hosted at the Internet Archive data centre, and are accessible to the public.

8.5 You may also wish to use the above to archive blogs or other social media material you have created.

Useful Links and further advice

Digital Preservation Coalition

The National Archives