The Early English Books Online Text Creation Partnership (EEBO-TCP)
The Early English Books Online Text Creation Partnership (EEBO-TCP) was established in 1999, with the aim of creating standardized, accurate XML/SGML encoded electronic text editions of early printed books. EEBO-TCP texts are transcribed and tagged by hand, based on the image sets which form the basis for ProQuest’s Early English Books Online (EEBO). In turn, EEBO-TCP’s work has become the basis for full-text searching within EEBO.
The EEBO corpus consists of the works printed in English between 1475 and 1700, encompassing literature, philosophy, politics, religion, geography, science and all other areas of human endeavor. The following are but a small sampling of the authors whose works are included: Erasmus, Shakespeare, King James I, Marlowe, Galileo, Caxton, Chaucer, Malory, Boyle, Newton, Locke, More, Milton, Spenser, Bacon, Donne, Hobbes, Purcell, Behn, and Defoe.
Phase I of EEBO-TCP production ran from 2001 to 2009, creating 25,363 searchable texts. So far in Phase II the project has added a further 22,971 texts, for a running total of 48,339. The project ultimately aims to complete a corpus of around 70,000 searchable electronic texts – one copy of every unique title printed in English between 1473-1700.
As Michael Ullyot put it, in his 2013 review of Digital Humanities Projects in Renaissance Quarterly: “The scale of the TCP’s data, both now and in the future, is staggering. Currently, with fewer than 60 percent of the texts released, the TCP already contains more than 900 million words.”
The TCP’s work, and the resulting text files, are jointly funded and owned by more than 150 libraries worldwide. Production for the Text Creation Partnership is based at the University of Michigan Library. The University of Oxford is the lead partner in the UK.
In addition, partner libraries and their users are welcome to locally store, host, manipulate, analyze and otherwise work with the encoded text files, just as if they had been created locally.
Ultimately, all of the TCP’s work will be placed into the public domain for anyone to use.
The 25,363 EEBO-TCP Phase I texts will be made freely available to the public January 1, 2015.
Projects making large-scale use of corpus include:
Smaller-scale scholarly projects using small numbers of EEBO-TCP texts, or even just one, include
Research and Scholarship
Two conferences hosted by the University of Oxford in 2012 and 2013 explored current issues and applications of EEBO-TCP and related Digital Humanities projects. Proceedings from the 2012 event can be found at the Oxford Research Archive (in the ‘Conference/Workshop Papers tab).