Abstraction & automation

Use abstract definitions

Do everything as abstract as possible. Try to never ever “hard code” values in your code. The reason is quite simple: Eventually, your data will change: There will be some additional data cleaning, data updates, or other changes to your data. If you write definitions etc. based on your current data in your code (e.g., define a new variable based on the current mean), they will become wrong as soon as the data changes - without you even noticing it!

Instead, use

return & ereturn objects and system variables

Example using the return list of the sum command to create a new variable

macros & macro functions

Example of defining a global with the exchange rate to convert currencies

Use automated (export) tables & graphs whenever possible (see the chapter on advanced graphs and tables and on dynamic documents).

Minimize copy & paste

Definitions etc. should be done at one point only to prevent inconsistencies and errors. This makes it also easier to change the definitions later on.

Most obvious example: Use loops for repetitive tasks:

Screenshot of a loop replacing values and adding a note to variables

Use the same do-file for definitions which re-occur at different steps, e.g., creating an index at base- & endline

Screenshot of two do-files calling on the same do-file named "wealth_quintiles"

For more complex repetitive tasks: Write programs (.ado-files)

Use automated error checks

Write error checks in your code to make sure everything works as intended, using for example:

isid
Checks if a variable/a combination of variables uniquely identifies the observations. Use the option "missok" for missing values.
confirm
Tool for many checks, e.g., whether a variable is numeric, whether a file exists etc.
assert
Tests an expression, e.g. whether the sum of two variables equals a third variable for all observations (make sure to exempt missings, if applicable).

Also, you can use "assert" option of the merge command to control the merge. Example:

Screenshot of using the merge command, first with the command "assert" after it, then with the option assert.

Variable lists

Should you use variable lists (e.g., lifeexp* or price-length)? They can be useful if variables are consistently ordered/named.
BUT: They can also easily lead to errors if the order or the names change! Consider using macros or the ds command:

Screenshot of 1) A local defining a variable list, followed by a loop over the local. 2) Using the ds command and then "recode" based on the return list.

Please note that some commands allow incomplete varnames as input, e.g., “med” instead of “medage” (not to be confused with “med*”)
This can easily lead to mistakes → use set varabbrev off

Helpful chapters

Loops

Tables

Advanced graphs with coefplot

Simple programs

Interaction with the system