Abstraction & automation
Use abstract definitions
Do everything as abstract as possible. Try to never ever “hard code” values in your code. The reason is quite simple: Eventually, your data will change: There will be some additional data cleaning, data updates, or other changes to your data. If you write definitions etc. based on your current data in your code (e.g., define a new variable based on the current mean), they will become wrong as soon as the data changes - without you even noticing it!
Instead, use
- return & ereturn objects and system variables
- macros & macro functions
Example using the return list of the sum command to create a new variable
data:image/s3,"s3://crabby-images/3ffd1/3ffd103ec6662e242ffbfab49103c158c244eef1" alt=""
Example of defining a global with the exchange rate to convert currencies
data:image/s3,"s3://crabby-images/b76b6/b76b6057144ce2761ef565dbf29339f89d0954ac" alt=""
Use automated (export) tables & graphs whenever possible (see the chapter on advanced graphs and tables and on dynamic documents).
Minimize copy & paste
Definitions etc. should be done at one point only to prevent inconsistencies and errors. This makes it also easier to change the definitions later on.
- Most obvious example: Use loops for repetitive tasks:
- Use the same do-file for definitions which re-occur at different steps, e.g., creating an index at base- & endline
- For more complex repetitive tasks: Write programs (.ado-files)
Screenshot of a loop replacing values and adding a note to variables
data:image/s3,"s3://crabby-images/0ad55/0ad55105a2b2f6d1750b181761adfc97c0b9eeea" alt=""
Screenshot of two do-files calling on the same do-file named "wealth_quintiles"
data:image/s3,"s3://crabby-images/6e007/6e00799814fdba1d80d0eeaa1c01f41c7aa89f19" alt=""
Use automated error checks
Write error checks in your code to make sure everything works as intended, using for example:
- isid Checks if a variable/a combination of variables uniquely identifies the observations. Use the option "missok" for missing values.
- confirm Tool for many checks, e.g., whether a variable is numeric, whether a file exists etc.
- assert Tests an expression, e.g. whether the sum of two variables equals a third variable for all observations (make sure to exempt missings, if applicable).
Also, you can use "assert" option of the merge command to control the merge. Example:
Screenshot of using the merge command, first with the command "assert" after it, then with the option assert.
data:image/s3,"s3://crabby-images/c60ea/c60eae690422bbe013ec0ead4306e18cf9f3f3dc" alt=""
Variable lists
Should you use variable lists (e.g., lifeexp* or price-length)? They can be useful if variables are consistently ordered/named. BUT: They can also easily lead to errors if the order or the names change! Consider using macros or the ds command:
Screenshot of 1) A local defining a variable list, followed by a loop over the local. 2) Using the ds command and then "recode" based on the return list.
data:image/s3,"s3://crabby-images/5b367/5b3673277a3cd5ef65d625041107085ca582eebe" alt=""
Please note that some commands allow incomplete varnames as input, e.g., “med” instead of “medage” (not to be confused with “med*”) This can easily lead to mistakes → use set varabbrev off