Data: Types, formats, and availability
Date
Monday, January 12, 2026Links of interest
- The Wikipedia page on observational accuracy and precision.
- Schoene et al., 2013 explores accuracy and precision in geochronology.
- Wilkinson et al., 2016 introduced FAIR guiding principles.
Notes
After testing out our development environments, we discussed:
- A formal definition for data.
- Collections of information.
- Made up of observations or conceptualizations.
- In the Earth sciences, often (though not always!) measurements of natural phenomena.
- Made up of objects, each with a set of attributes.
- Kinds of attributes (nominal, ordinal, interval, ratio) and the fact that values may be either discrete or continuous.
- The difference between structured and unstructured data.
- The idea that measurements are uncertain.
- Data types and formats.
- Formal definition for artificial intelligence (AI) and machine learning (ML) and what sort of data might be ideal for AI/ML problems.
- AI is the field of producing behaviors that both mimic and extend human capabilities.
- ML is a subcategory of AI that uses algorithms (models) to recognize patterns in data and generate insights.
- The fact that your data should be open and follow FAIR guiding principles. Doing so entails:
- Preserving raw (i.e., unmodified) data.
- Making the data available to others, typically online, freely accessible, placed in a reliable archive (e.g., Zenodo, Dryad, Pangea, Figshare, etc.), and with an open license.
- Providing metadata and documentation that makes data easy to understand.
- Storing data in a standard file format, so that they are easy to use.
- Including a digital object identifier so that data are citeable.