Abstraction & automation
Use abstract definitions
Do everything as abstract as possible. Try to never ever “hard code” values in your code. The reason is quite simple: Eventually, your data will change: There will be some additional data cleaning, data updates, or other changes to your data. If you write definitions etc. based on your current data in your code (e.g., define a new variable based on the current mean), they will become wrong as soon as the data changes - without you even noticing it!
Instead, use
- return & ereturn objects and system variables
- macros & macro functions
Example using the return list of the sum command to create a new variable
Example of defining a global with the exchange rate to convert currencies
Use automated (export) tables & graphs whenever possible (see the chapter on advanced graphs and tables and on dynamic documents).
Minimize copy & paste
Definitions etc. should be done at one point only to prevent inconsistencies and errors. This makes it also easier to change the definitions later on.
- Most obvious example: Use loops for repetitive tasks:
- Use the same do-file for definitions which re-occur at different steps, e.g., creating an index at base- & endline
- For more complex repetitive tasks: Write programs (.ado-files)
Screenshot of a loop replacing values and adding a note to variables
Screenshot of two do-files calling on the same do-file named "wealth_quintiles"
Use automated error checks
Write error checks in your code to make sure everything works as intended, using for example:
- isid Checks if a variable/a combination of variables uniquely identifies the observations. Use the option "missok" for missing values.
- confirm Tool for many checks, e.g., whether a variable is numeric, whether a file exists etc.
- assert Tests an expression, e.g. whether the sum of two variables equals a third variable for all observations (make sure to exempt missings, if applicable).
Also, you can use "assert" option of the merge command to control the merge. Example:
Screenshot of using the merge command, first with the command "assert" after it, then with the option assert.
Variable lists
Should you use variable lists (e.g., lifeexp* or price-length)? They can be useful if variables are consistently ordered/named. BUT: They can also easily lead to errors if the order or the names change! Consider using macros or the ds command:
Screenshot of 1) A local defining a variable list, followed by a loop over the local. 2) Using the ds command and then "recode" based on the return list.
Please note that some commands allow incomplete varnames as input, e.g., “med” instead of “medage” (not to be confused with “med*”) This can easily lead to mistakes → use set varabbrev off