Adventures in audiovisual digitisation* (part 3)

*Not really digitisation, more digital transfer

Because many of our depositors (comedians, promotors and producers) have worked on television and radio we have been given copies of their contributions to these programmes as part of their collections.  Material has been deposited on CD (both audio cd and CD-R), and on DVDs. This material is usually contributor copies that they were given by the broadcaster or production company, although we do have a few ‘off-air’ recordings. We’ve also received published material, such as recordings of specific shows, tours, or compilations the depositor has appeared on, including in cassette, audio CD and DVD formats.

In this post I will focus on how we are capturing audio and video material deposited on CD and DVD, which Richard Wright nicely describes as ‘digital content not in files’ (page 9). ‘Digital content not in files’ refers to digital recordings which require specific technology and workflows to move the sound/images from their dedicated physical carriers (such as DAT, minidisc, and DV formats, as well as material held on optical media, such as audio CDs, CD-R and DVDs) into digital files (page 3).

CD from the Mark Thomas Collection of a recording from the Sheffield leg of his 2009 'It's the Stupid Economy' tour

CD from the Mark Thomas Collection of a recording from the Sheffield leg of his 2009 ‘It’s the Stupid Economy’ tour

Whilst previously optical media was seen as a preservation medium and storage solution, it is now recognized that optical discs are an ‘at-risk’ format (see ‘An Introduction to Optical Media Presevation’ by Alex Duryee’) and so we have been transferring any material of high priority deposited on optical media (mostly that which is unpublished on re-writable discs) to a digital file.

Our approach to this material has varied depending on the format.  For material on audio CDs and CD-R we have viewed the physical format as a file carrier, as a way to share and transport audio files, and we view the content on the disc as the important thing to capture (rather than the structure on the disc).  However, for DVDs which are more structured (often with a menu) we have created a disc image, which we can then mount in tools such as VLC.

Audio CDs (Compact Disc Digital Audio / CD DA)

Audio CDs hold data in the Compact Disc Digital Audio (CD DA) format. Data is written in the pulse-code modulation stream (PCM), at two channel, 16 bit, and 44.1kHz. When an audio CD is placed in your disc drive the operating system will interpret the data into different files (tracks) with the extension .cda.

After consideration we decided not to extract audio from audio CDs using a disc imaging workflow, but to extract the data and save as a WAVE file. We made this decision based on a number of factors.

  1. Firstly, it was the audio data itself which was important to us, rather than the structure of the disc.
  2. Secondly, because the discs we had were uncomplicated; many of the audio cds contained only two .cda files (one of which was often a radio tone/test track) or were collections of edited tracks from live shows put onto a CD (but not published). Note that we prioritised material deposited on ‘unpublished’ (often re-writable) CDs and DVDs; we have not transferred any material deposited which has been published and is on mass replicated discs.
  3. I think it would also be honest to say that, thirdly, disc imaging audio CDs seemed rather complicated and unnecessary for a relatively small number of discs within our collection.  I’m slightly ashamed to say that this goes against the guidance provided by avpreserve, the open preservation foundation, and the DPC/British Library, and I would gladly be corrected if the digital preservation and archiving community thinks we should change our workflow! I would also be interested to hear from other small archives who are undertaking this sort of work, and whether they have disc imaged their CDs or taken a similar route to us.

Instead of disc imaging we extracted audio data using Adobe Audition (a tool we were using for digitising our sound cassettes and MiniDiscs) and set the read speed to be low in order to provide as accurate results as possible. The data was originally written to the disc as PCM 16 bit/44.1kHz so we extracted the data as this and used the WAVE (.wav) wrapper. The structure of the audio CD disc was maintained using filenames (numbered sequentially by track on the disc) and through metadata which we embedded in BWF format (using the BWF MetaEdit tool).

We have also received CD data discs containing mp3 files. Although mp3 is not an archival format the sound files are already compressed and saving them as wav files will only increase the file size, but not the quality of the file. MP3 is a format widely used it is unlikely to become obsolete in the immediate future and so poses no preservation risk. We have also been capturing MP3s through audio editing software, either Adobe Audition or Audacity. We are exporting through software, rather than copying straight from the disc, as the software you use will have an error correction element and help prevent any errors during the export/copy.

DVDs

With rewritable media accessioned into the British Stand-Up Comedy Archive collections (such as hard drives, floppy drives), or media which has inbuilt menu functionality (i.e. DVDs), we thought that here it was important to create a disc image, a sector-by-sector copy, as part of the process of digitally preserving the original accession. Our aim was to:

  • Ensure that the disc/drives are free from viruses
  • Capture an ‘image’ of the disc/drive, showing the structure of the files (including folder structure) on the original disc as it was when deposited with BSUCA.
  • Secure the contents of the disc/drive (i.e. the documents/files on the disc itself)

We have used the free version of ISOBuster to image DVDs and using this tool created an .iso file and a .cue file.  A complete disk image (.iso file) serves as the preservation master, and from the iso file we have then created an access copy as an mp4 (h.264) file, using VLC, for use in our reading room.

Creating disc images of DVDs using ISOBuster

Creating disc images of DVDs using ISOBuster

Next time… how we have been digitising VHS and transferring material on DVCam and MiniDV.

Further reading and helpful links

‘Preserving Moving Pictures and Sound’, Richard Wright, DPC [Digital Preservation Coalition] Technology Watch Report 12-01 March 2012, http://dx.doi.org/10.7207/twr12-01

‘An Introduction to Optical Media Presevation’, Alex Duryee, AVPreserve, http://www.avpreserve.com/wp-content/uploads/2014/04/OpticalMediaPreservation.pdf

‘Developing a Robust Migration Workflow for Preserving and Curating Hand-held Media’, Angela Dappert, Andrew Jackson, Akiko Kimura http://arxiv.org/ftp/arxiv/papers/1309/1309.4932.pdf

‘Establishing a Workflow Model for Audio CD Preservation’, Tonisant, Open Preservation Foundation blog, http://openpreservation.org/blog/2013/11/19/establishing-workflow-model-audio-cd-preservation/