Taking the plunge: preparing your research for the digital archive

  "16474476590_60011f6319_o" by Gerd Leonhard. CC-BY-SA

When you have completed a piece of work and you are ready to archive it, you need to make sure it will be accessible for the long term and that the value of the work is preserved. For digitally born research works this is a case of saving the work in a standard format and creating a logical file arrangement. For practice-based research and research expressed in a physical format, your work will need to be transformed into a digital format.

Your research outputs, data or supporting materials need to be recorded in a digital format to be recorded in the Kent Academic Repository KAR, the Kent Data Repository (KDR) or another online repository. This format must be compatible with preservation, with sharing and with reuse. It should be chosen carefully so that the qualities of the work you want to communicate are retained  Adopting good practice at an early stage will make it easier for you to work with your files later and will help others experience and understand the original work.

Analogue to digital

The original format of the research is usually the one most appropriate to express its value and there is a risk that digitization loses important elements. However, digitization is useful as it allows:

  • the idea of the work to be shared using online technology. An ‘online version of record’ will improve its exposure and dissemination and will help you to ensure that others see your preferred online interpretation of the work on social media.
  • the work to attract online metrics. Hosting the digitized version on a reputable repository will enable you to assign accurate descriptions (metadata) and a Digital Object Identifier (DOI) to the work. This will allow you to record citations and mentions on social media.
  • long-term, secure preservation of the work.  This is especially important where physical preservation is not available or is prohibitively expensive, or, where a work is a performance or an ephemeral event. Recording it using film or audio can be the only way to ensure the work is captured.
  • the creation of a record of the work to be used as evidence to be presented for internal and external reports like the REF or to demonstrate to funders, publishers and colleagues.

How you choose to digitize your work will depend upon what you want to communicate and achieve with the process and the qualities of the work itself.  If you need help contact researchsupport@kent.ac.uk and we will make sure your query is directed to the best person for advice.

Digital surrogates

The digitized version may not exactly represent the experience of the original and it is never intended to replace the work as it is not ‘the same’ thing.  Because of this the term  ‘Digital Surrogate’  is used to differentiate between the original and the digitized version. You may need to produce more than one digital surrogate. For Instance:

  • A book, notebook or written piece of text – a flatbed scanner will record the flat image of the pages as they appear but the transcription of handwritten text into a separate file will make it more accessible for screen readers and text mining software.
  • A painting drawing or flat artwork – a flatbed scanner could be used but if the work is very large, delicate or detailed  photographs may be more appropriate.  Multiple images including close-ups and different angles may be used to communicate the content in its entirety.
  • A sculpture or 3-dimensional work – a photograph, a series of photographs or a film could be accompanied by digitized sketches and narrative.
  • A building – again, a collection of photographs or a film and a commentary.
  • A time-based work – a recording or film or written description
  • A performance – a film, audio recording, annotations and written descriptions.  In cases where the audience reaction is part of the research expression, these could be multiple recordings and separate interviews etc.

The precise media chosen is up to the researcher but, in all cases, a narrative (written, recorded or both) will be necessary to explain how the digital surrogates represent the original and any changes caused by the digitization process.  You can include this with the files when you archive them.  See below for information about Archiving your work and Documentation and metadata.

Digital to digital

Even where the original format is digital or online, you must save it into a format which is compatible with long-term preservation and usable by anyone now, or in the future.  Work that is created using specific software will usually be in a proprietary file format specific to that software (eg. MsExcel, SPSS).  This proprietary software may not be available in the future or to some users of your data.  Even you may have problems accessing your data in non-standard formats if you move to another University or country. Conversion of files should be carried out by someone familiar with the originals as the completeness and accuracy of the copies will need to be checked.  Using a checksum tool like MD5summer will ensure the integrity of copies as you move them around.  Here is a short video about it.

Complex digital works, like a web page or dynamic internet resource, may require special tools to convert them to another file type while ensuring the creative effect is accessible and usable in the future.  Sites can be preserved by copying the files that contribute to them but  you will need images of the way they appear or emulations of how they behave.  For instance, Webrecorder is a free and straightforward tool to use to record websites and it creates files in a standard format (warc) that can be archived and recreated with the site page relationships intact.

Recommended formats

The common factor in choosing  file format for archiving is that they are:

  • open or generic and not specific to a particular software or hardware product.  They can be used with free software applications as well as proprietary brands and do not require licences to use;
  • sustainable and stable, they will not be rendered obsolete and remain usable over time;
  • widely used and adopted across the relevant community so anyone will be able to use the content;
  • not ‘lossy’ so that the files can be converted to other formats without losing quality or content.

There is some conflict between preservation and sharing in that high-quality files containing all the information about the content – lossless – result in large files that are difficult to share.  Whilst smaller files that can be shared easily lose quality and some of the information.  In these cases, it is a good idea to archive two versions – one in a lossy format for sharing and another in a lossless format for preservation and when complete reproduction is required. There is no point saving a file created in a lossy format as a lossless format as there will be no more detail to preserve.

The most usually recommended formats are:

Text and documents created using MSWord or presented in PDF file formats

  • save as PDF/A or plain text in rtf or otf (OpenDocument Text).  PDF/A will retain the layout and formatting and is used in most page-turning applications.  txt files are interoperable with most text-editing software and have no complex formatting.

Spreadsheets and tables created using MSExcel

  • save as csv (comma-separated values) or tab (tab delimited) files.  If your files have lots of macros you can also save them as MSExcel as this is a widely used package but a backup csv file of the content is advisable.  Many proprietary software packages (like SPSS) offer a standard or portable file type that is suitable for sharing, reuse and preservation.

Images scanned, photographed or created using image manipulation software (like Adobe Photoshop)

  • save as tiff files to preserve the maximum detail and information contained in the file.  Save it as a jpeg file or a PDF/A as well for immediate sharing, use on the web, or by email.

Audio recordings

  • save as flac, the most lossless format but if recordings are created as mp3 or wav files these are also suitable for long-term preservation, use and sharing.

Video recordings

  • save as mp4, ogv or mj2

Other source types and more information is available from the UK Data Service. The UK Data Service is funded by the Economic and Social Research Council (ESRC) to meet the needs of researchers, students and teachers from all sectors.  Their guidance is intended to be used in relation to research data but applies to any situation where digital files need to be curated for preservation, reuse and sharing.  They offer detailed advice for recommended formats for many types of material but have an extensive website covering all aspects of data archiving. You may also like to look at the  Library of Congress information pages which has lots of advice concerning preservation.  Their Recommended Formats Statement covers a slightly different range of source types, for instance, website preservation in detail.

Archiving your work

To ensure that your work is preserved and available to use and share in a safe and reliable way, upload the files with the accompanying documents and metadata to a reputable repository.  The University of Kent supports KAR and KDR for its staff and students’ research, but there are many others that offer specialist services for subjects and funder specific works.  See Re3Data for a list of repositories that specialise in archiving research data or OpenDOAR for a general list of repositories.  All the services listed by these directories are quality-assured before inclusion.

Using a recommended archive is a good idea because:

  • it will have the sustainability to preserve your work for the foreseeable future and the governance and administrative backing to keep it safe, secure and accessible.  You will not need to worry about storing the work yourself, or checking the integrity of the files over time;
  • access to and sharing of your work will be managed for you and licences can be applied to ensure users know how they can use and re-share your work.  You will not need to answer queries or repeatedly send out copies of your work;
  • your work will be recorded in a standard format with digital object identifiers and metadata compliant with international standards that will enable accurate and consistent citation and metrics relating to its reuse;
  • you will be able to demonstrate compliance with funder, publisher and institutional reporting and preservation requirements

The University of Kent Academic Repository (KAR) and Data Repository (KDR) are available to use by university members and the Research Support Team can help with queries about using these services or any other archives that may be more suitable. Before archiving or sharing your work you should check the policies of other stakeholders such as funders or publishers.  The Jisc SHERPA services are a good starting point if you are not sure what your publisher of funder policy is.

Once archived you will want to be able to retrieve and use your work.  Most archives only offer secure and findable storage for your files, but not the ability to use them as originally intended and some of the open formats do not have the same level of functionality as the originating formats.  For example, multi-tab excel workbooks need to be saved as several single sheet csv files and interoperability between sheets is lost. Emulator software allows computers to run files and programmes designed for systems that are no longer available, and to replicate more tactile experiences. For instance, page turning software can present digitised books with a more book-like experience and most preservation formats are supported by emulator software like the WebRecorder Player that will recreate your website from your archived warc files.

File documentation and metadata

Alongside the actual files, you will need to collect information about your work.  This may be about the context in which it was created, any publications it relates to or information about how it has been created or changed. The purpose of documentation is to ensure future users and readers of your work understand what it is, where it came from, how they can use it and how they should cite it.  This information can be saved in a separate text file, known as a README file which is kept alongside the files containing your work.  See the University of Kent IS Research Support guidance for advice on what to include in a README file.

Your work will also be described in the information recorded when you upload your files to a formal archive or repository.  That is the details, or metadata, recorded in a library or archive catalogue record or in the record on publication or data repositories like the Kent Academic Repository (KAR) or the Kent Data Repository (KDR).  Metadata structures in KAR and KDR are created according to standards designed to ensure interoperability between systems and the accuracy of search engines like Google or local resource discovery engines like Library Search.

Further guidance

For more help about organising and preparing your work for archiving see https://www.kent.ac.uk/library/research/index.html   If you need help with a specific project or work our staff are able to advise on digitization of research data and outputs. Contact researchsupport@kent.ac.uk and we will make sure your query is directed to the best person to help.

Leave a Reply

Your email address will not be published.