Folder structure and version control
Have a clear folder structure & file system
- Separate “raw” from prepared data, inputs from outputs, etc.
- Provide a ReadMe-file in the main folder: It should contain all information to understand the folder structure & run the do-files
-
Have a Master Do-File:
- Contains settings, globals, etc.
- Runs all do-files in the correct order
- Recommended: Also provide data & code to build analysis dataset from “raw” (de-identified) dataset
Decide on a system for version control of files & documentation
Examples:
- Github (e.g. https://github.com/BITSS/wb_reusable_analytics)
- OSF (https://osf.io/)
- There's also a limited version control with owncloud
Note:
-
You can use the
creturn
list to capture date/user/system (see below) -
You can use
datasignature
to check whether data changed and cf to see how two datasets differ
Directories and paths
Never use the Windows „\“ in file paths! They don‘t work on Mac & Linux and cause problems when using globals!
Instead, there are possible ways to define (flexible) filepaths:
-
Set directory (in Master do-file) & use relative filepaths
cd "/Users/anna/ownCloud/Project" use "data/raw/baseline.dta"
-
Put directory in a global (in Master do-file) & use global for absolute filepaths
global dir "/Users/anna/ownCloud/Project" use "$dir/data/raw/baseline.dta"
I personally prefer option 2, as with option 1, it might happen that people accidentally set the working directory before, but to a wrong folder which also has the subfolder data/raw and then the dataset "baseline.dta". This might especially happen if you stick to a folder and naming convention. When using option 2, you can make sure that nothing happens before the global is set (use confirm). Of course, this is only the "safer" option if the global has a more unique name than "dir".
There are also ways to get the correct filepath automatically, see for example:
- profile.do (https://julianreif.com/guide/#stata-profile)
- creturn list: c(username) (see DIME Master Do-file)
- creturn list: c(pwd) (IPA cleaning guide)
Version control in Stata
Commands might work differently under different Stata versions. Use the command “version” to set the Stata version (set it to the lowest version possible to ensure widest application). Also check out the command ieboilstart by DIME to harmonize settings.
The version control of user-written commands is a bit trickier, as there is no automatic version control for user-written commands. Instead:
- Save all used user-written commands in a separate folder such that others can use them in exact the same version you did
- Run them all in the Master do-file
Examples: Master do-file by DIME / script by Julian Reif
Also see
Data management in the onboarding course by DIME: OSF