Data: Types, formats, and availability

Date

Monday, January 12, 2026

Notes

After testing out our development environments, we discussed:

  1. A formal definition for data.
    • Collections of information.
    • Made up of observations or conceptualizations.
    • In the Earth sciences, often (though not always!) measurements of natural phenomena.
    • Made up of objects, each with a set of attributes.
  2. Kinds of attributes (nominal, ordinal, interval, ratio) and the fact that values may be either discrete or continuous.
  3. The difference between structured and unstructured data.
  4. The idea that measurements are uncertain.
  5. Data types and formats.
  6. Formal definition for artificial intelligence (AI) and machine learning (ML) and what sort of data might be ideal for AI/ML problems.
    • AI is the field of producing behaviors that both mimic and extend human capabilities.
    • ML is a subcategory of AI that uses algorithms (models) to recognize patterns in data and generate insights.
  7. The fact that your data should be open and follow FAIR guiding principles. Doing so entails:
    • Preserving raw (i.e., unmodified) data.
    • Making the data available to others, typically online, freely accessible, placed in a reliable archive (e.g., Zenodo, Dryad, Pangea, Figshare, etc.), and with an open license.
    • Providing metadata and documentation that makes data easy to understand.
    • Storing data in a standard file format, so that they are easy to use.
    • Including a digital object identifier so that data are citeable.