Skip to main content

Data Management : Organizing and describing data

File Formats

Formats more likely to be accessible in the future are:

  • Non-proprietary
  • Open, documented standard
  • Common usage by research community
  • Standard representation (ASCII, Unicode)
  • Unencrypted
  • Uncompressed

Examples of preferred format choices:

  • PDF/A, not Word
  • ASCII, not Excel
  • MPEG-4, not Quicktime
  • TIFF or JPEG2000, not GIF or JPG
  • XML or RDF, not RDBMS

Source: MIT 

Also see this detailed list of file formats from the University of Oregon Library. 

Documenting Your Data

Metadata is data about data. Descriptive information, associated with your datasets, will help you and others make sense of your data and properly cite your work. 

If you plan on depositing in a data repository, consult the repository directly about their metadata requirements. Most data repositories have their own metadata standards. 

The University of Oregon Libraries and MIT Libraries have put together excellent overviews of metadata for research data. The Dublin Core metadata standard and DataCite's metadata schema are also good starting points. 

Questions to Consider

When your data is well organized, described, and documented, other people will be able to understand and re-use it. 

Questions to Consider: 

  • How will you document your data and project?
  • How will you organize your files into directories, and what naming conventions will you apply? 
  • Which file formats will you use for your data, and why?
  • What form will the metadata describing/documenting your data take?
  • How will you create or capture these details?
  • Which metadata standards will you use and why have you chosen them? (e.g. accepted domain-local standards, widespread usage)
  • What contextual details (metadata) are needed to make the data you capture or collect meaningful?

(Sources: UC BerkeleyUMass AmherstUniversity of Michigan, Alix Keener, Creative Commons Attribution 4.0 license.)

Naming Conventions Best Practices

  • Be consistent and descriptive in naming, formatting, and organizing files.
  • Be specific and obvious about what the files contain.
  • File names should allow you to identify precise research.

You might consider including some of the following information in your file names, but you can include any information that will allow you to distinguish your files from one another:

  • Project or experiment name or acronym
  • Location/spatial coordinates
  • Researcher name/initials
  • Date or date range of experiment
  • Type of data
  • Conditions
  • Version number of file
  • Three-letter file extension for application-specific files

(Adapted from Stanford University Libraries)

This work is licensed under a Creative Commons Attribution 4.0 International License.