The Love Your Data Week website has some amazing information and loads of resources but we have gathered some of the best resources and advice from the week below:
Data quality is the degree to which data meets the purposes and requirements of its use. Depending on the uses, good quality data may refer to complete, accurate, credible, consistent or “good enough” data.
- Examples of how not to prepare or provide data http://okfnlabs.org/bad-data/
- Data quality assessment (provides a table of various quality dimensions and their definitions): Pipino, L. L., Lee, Y. W., & Wang, R. Y. (2002). Data quality assessment. Communications of the ACM, 45(4), 211. http://doi.org/10.1145/505248.506010
Good documentation tells people they can trust your data by enabling validation, replication, and reuse.
Why does having good documentation matter?
- It contributes to the quality and usefulness of your research and the data itself – for yourself, colleagues, students, and others.
- It makes the analysis and write-up stages of your project easier and less stressful.
- It helps your teammates, colleagues, and students understand and build on your work.
- It helps to build trust in your research by allowing others to validate your data or methods.
- It can help you answer questions about your work during pre-publication peer review and after publication.
- It can make it easier for others to replicate or reuse your data.
Best Practices for Project Metadata: http://ropensci.github.io/reproducibility-guide/sections/metaData/
Readme files are a simple and low-tech way to start documenting your data better. Check out the sample readme.txt (filename = ) from IU or Cornell University’s data working group guide with tips for using readme files
What makes data good?
- It has to be readable and well enough documented for others (and a future you) to understand.
- Data has to be findable to keep it from being lost. Information scientists have started to call such data FAIR — Findable, Accessible, Interoperable, Re-usable. One of the most important things you can do to keep your data FAIR is to deposit it in a trusted digital repository. Do not use your personal website as your data archive.
- Tidy data are good data. Messy data are hard to work with.
- Data quality is a process, starting with planning through to curation of the data for deposit.
The data you find is only as good as the question you ask. Think of the age-old “who, what, where, when” criterion when putting together a question – specifying these elements helps to narrow the map of data available and can help direct where to look!
- WHO (population)
- WHAT (subject, discipline)
- WHERE (location, place)
- WHEN (longitudinal, snapshot)
One way to find data is to think about what organization, industry, discipline, etc. might gather and/or disseminate data relevant to your question. If you’re looking for general, multidisciplinary data sets – check out sources like ICPSR (Inter-university Consortium for Political and Social Research) or Amazon Public Datasets. Lists of open data repositories, such as Open Access Data Repositories, can help point to more discipline specific data sets.
Legacy, heritage and at-risk data share one common theme: barrier to access. Data that has been recorded by hand (field notes, lab notebooks, handwritten transcripts, measurements or ledgers) or on outdated technology or using proprietary formats are at risk.
Securing legacy data takes time, resources and expertise but is well worth the effort as old data can enable new research and the loss of data could impede future research.
- CODATA Data at Risk Task Group
- RDA Data Rescue Interest Group
- International data rescue portal
- Center for International Earth Science Information Network: “Data Rescue at a Scientific Data Center”
- Curating a 23-year oceanographic time-series
- Unlocking GATE: Gaining Access to Analog Data in a Digital World