Folder structure and version control

Have a clear folder structure & file system

  • Separate “raw” from prepared data, inputs from outputs, etc.
  • Provide a ReadMe-file in the main folder: It should contain all information to understand the folder structure & run the do-files
  • Have a Master Do-File:
    • Contains settings, globals, etc.
    • Runs all do-files in the correct order
  • Recommended: Also provide data & code to build analysis dataset from “raw” (de-identified) dataset

Decide on a system for version control of files & documentation

Examples:

Note:

  • You can use the creturn list to capture date/user/system (see below)
  • You can use datasignature to check whether data changed and cf to see how two datasets differ

Directories and paths

Never use the Windows „\“ in file paths! They don‘t work on Mac & Linux and cause problems when using globals!

Instead, there are possible ways to define (flexible) filepaths:

  1. Set directory (in Master do-file) & use relative filepaths
    cd "/Users/anna/ownCloud/Project"
    use "data/raw/baseline.dta"
  2. Put directory in a global (in Master do-file) & use global for absolute filepaths
    global dir "/Users/anna/ownCloud/Project"
    use "$dir/data/raw/baseline.dta"

I personally prefer option 2, as with option 1, it might happen that people accidentally set the working directory before, but to a wrong folder which also has the subfolder data/raw and then the dataset "baseline.dta". This might especially happen if you stick to a folder and naming convention. When using option 2, you can make sure that nothing happens before the global is set (use confirm). Of course, this is only the "safer" option if the global has a more unique name than "dir".

There are also ways to get the correct filepath automatically, see for example:

Version control in Stata

Commands might work differently under different Stata versions. Use the command “version” to set the Stata version (set it to the lowest version possible to ensure widest application). Also check out the command ieboilstart by DIME to harmonize settings.

The version control of user-written commands is a bit trickier, as there is no automatic version control for user-written commands. Instead:

  • Save all used user-written commands in a separate folder such that others can use them in exact the same version you did
  • Run them all in the Master do-file

Examples: Master do-file by DIME / script by Julian Reif

Also see

Data management in the onboarding course by DIME: OSF