Abstraction & automation
Use abstract definitions
Do everything as abstract as possible. Try to never ever “hard code” values in your code. The reason is quite simple: Eventually, your data will change: There will be some additional data cleaning, data updates, or other changes to your data. If you write definitions etc. based on your current data in your code (e.g., define a new variable based on the current mean), they will become wrong as soon as the data changes - without you even noticing it!
Instead, use
- return & ereturn objects and system variables
- macros & macro functions
Example using the return list of the sum command to create a new variable
![](material/Replication & transparency\blobs/blob13.png)
Example of defining a global with the exchange rate to convert currencies
![](material/Replication & transparency\blobs/blob14.png)
Use automated (export) tables & graphs whenever possible (see the chapter on advanced graphs and tables and on dynamic documents).
Minimize copy & paste
Definitions etc. should be done at one point only to prevent inconsistencies and errors. This makes it also easier to change the definitions later on.
- Most obvious example: Use loops for repetitive tasks:
- Use the same do-file for definitions which re-occur at different steps, e.g., creating an index at base- & endline
- For more complex repetitive tasks: Write programs (.ado-files)
Screenshot of a loop replacing values and adding a note to variables
![](material/Replication & transparency\blobs/blob15.png)
Screenshot of two do-files calling on the same do-file named "wealth_quintiles"
![](material/Replication & transparency\blobs/blob16.png)
Use automated error checks
Write error checks in your code to make sure everything works as intended, using for example:
- isid Checks if a variable/a combination of variables uniquely identifies the observations. Use the option "missok" for missing values.
- confirm Tool for many checks, e.g., whether a variable is numeric, whether a file exists etc.
- assert Tests an expression, e.g. whether the sum of two variables equals a third variable for all observations (make sure to exempt missings, if applicable).
Also, you can use "assert" option of the merge command to control the merge. Example:
Screenshot of using the merge command, first with the command "assert" after it, then with the option assert.
![](material/Replication & transparency\blobs/blob17.png)
Variable lists
Should you use variable lists (e.g., lifeexp* or price-length)? They can be useful if variables are consistently ordered/named. BUT: They can also easily lead to errors if the order or the names change! Consider using macros or the ds command:
Screenshot of 1) A local defining a variable list, followed by a loop over the local. 2) Using the ds command and then "recode" based on the return list.
![](material/Replication & transparency\blobs/blob18.png)
Please note that some commands allow incomplete varnames as input, e.g., “med” instead of “medage” (not to be confused with “med*”) This can easily lead to mistakes → use set varabbrev off