Data are information in a raw form. They are the basic unit of knowledge before analysis or interpretation takes place. Research data is the basis of every research project and paper. Open data are “ data that can be freely used, re-used and redistributed by anyone – subject only, at most, to the requirement to attribute and share alike.” (Open Data Handbook https://opendatahandbook.org/) Open data from a single source can be reused to inform further research and from multiple sources it can be combined to produce big data applications.
Why reuse data
Using data from reliable sources gives your work credibility. Use it to:
- test your ideas and assumptions,
- develop new ideas
- provide evidence that backs-up your research
- inform your methodologies and models
- enable interdisciplinary research
If data have already been created which cover your area of interest you can use these to inform your research without the cost and time spent repeating the exercise.
Sources of open data
Data is produced and published by numerous sources. These include:
- Administrative processes of supra-national, national or local government data portals are a useful source of statistics and social information. Local councils publish this data because it saves them money, Manchester City Council cites a £6 billion saving to the UK economy and informs their population, but it can also be used as a source for researchers. The UK government publishes data across all its departments. The site includes some nice tools like the Ordnance Survey Code-point with Polygons. The EU also publishes data though european.eu portal and organisations like the World Bank and the United Nations also publish their data.
- Output from the operation of machinery like microscopes or telescopes is often open to enable collaboration across geographical and subject boundaries.
- Academic research – funders, publishers and research institutions all require that data created during the research processes and used to support claims made in publications are made available in an open access format.
- For funders this means that they can get the best value from the funding they provide data as they are reused and recited each time.
- For publishers it reduces the risk of retraction as reviewers are able to check the data before publication and after publication the data is evidence of research rigour.
- For Universities data preserved and recorded in open repositories enables them to keep track of research performed in their sphere of influence and improves institutional reputation.
- For researchers, making their data open, improves citation and collaborative opportunities as well as demonstrating good practice.
How to find data
Look for data in reliable data repositories. Look for:
- standards like the CoreTrustSeal and the adoption of FAIR data principles.
- data repositories that are supported by organisations you already trust
- go directly to a government or organisational website and search for their data portal.
Places to find statistical data
- The Proceedings of the Old Bailey fully searchable, covering 197,7455 criminal trials held at London’s central criminal court.
- NHS Digital for NHS statistics and reusable patient data.
- UK Office for National Statistics – website of the UK government statistical service and source for their reports and publicly available datasets.
- Find open data – UK open government data search engine. This is an aggregator that does not host datasets but links out to where open data is stored.
- Eurostat – the statistical office of the European Union.
- USA.Gov – the American government source of data and statistics.
- World Trade Organisation.
- World Bank Open Data.
or use an aggregator:
- Google Public Data Allows you to search several data sources at once and creates visualisations.
Places to find research data
- http://www.re3data.org is a database of data repositories established by DataCite – the body used by data creators to issue Digital Object Identifiers (DOIs). To be included the repositories must meet international standards and be considered trustworthy. Use the search engine or browse by subject, content type or country, to find relevant repositories.
- Google Dataset Search is an initiative by Google to provide a specialist search engine for data wherever it is stored across the world. It is new technology and their mechanisms for harvesting records are still being developed and depend upon common standards which are not yet universal but it is getting more and more comprehensive.
- UK Data Archive is funded by the Economic and Social Research Council (ESRC) but contains data sets beyond those generated by its funded projects.
- Wellcome Open Research contains data from Wellcome funded researchers
- Zenodo is the repository for European Commission funded research.
- The Open Science Foundation platform offers a free, open platform for scholars to share their research and generate collaboration.
- Publishers often link to, or publish background data alongside the articles. Look for data when you are conducting your literature review – like this work.
- The NHS archives and shares their data to inform policy and monitor and improve care
How to find University of Kent data
The Kent Data Repository (KDR) records and archives data sets created at the University of Kent. Search for datasets or link to the data from publication and thesis records in the Kent Academic Repository (KAR).
In Google Dataset Search use the term “University of Kent” to identify datasets created here. Be aware of results for “The University of Kentucky” or “Kent State University”.
Data about the University can be found on the website. Alternatively, look at the government’s HESA site and compare statistics about Kent with other Universities. The Times Higher Education World University Rankings provide different criteria for comparing the University of Kent with other institutions.
How can I reuse data
Datasets held in repositories will have information relating to how the data can be reused. This may be in the form of a written statement, terms that you need to acknowledge before the download can take place, or the use of a standard licence. There are several types of licence including:
- Creative Commons
- Open Data Commons
- Open Government Licence used by government agencies and any data that is held under Crown Copyright.
You should always cite the data source and creator. Most referencing styles tell you how to reference data. The UK Data Service provides advice if you are unsure.
Reliable data will be accompanied by information that describes the context and provenance of the data. Ensure that the origin of the data, the creators and methodology is clear and that the rights of the data subjects have been respected.