Standard error adjustments
For heteroskedasticity:
- In order for the OLS standard errors to be valid, the error term must be i.i.d
- In reality , this condition is almost never fulfilled. Especially, heteroskedasticity is a common problem
- One way of dealing with heteroskedasticity is to use generalized least squares , i.e. you need to transform your data before running OLS to obtain correct SE. However, the exact nature of heteroskedasticity must be known.
- You will most likely not have the necessary information to conduct GLS. Instead you have to options : 1. Use Feasible Estimated GLS (rather seldomly done), 2. Adjust your standard errors by using a heteroskedasticity-robust variance estimator. In the past, this was done by adding the option ", robust" in your regression command. However, this is probably not the best estimator, as discussed here. Instead, it might be preferable to use ", vce(hc3)"
- Note that using the robust variance estimator is inefficient compared to GLS but GLS produces biased results if incorrectly specified
For Autocorrelation / serial correlation:
- The i.i.d assumption is also violated if error terms are autocorrelated, i.e. the error terms within a unit are correlated but not across units
- This requires several observations per unit
- Examples for this are panel datasets in which countries are observed over time. It is likely that the error terms for one country have something in common, inducing correlation over time (also known as serial correlation ). Another example is correlation between household members due to shocks to the entire household
- The most common and simplest (but again, not the most efficient ) way of dealing with this problem is to specify clustered standard errors
- To implement clustered SE, specify the option cluster variablename after reg
- Note that clustering reduces the number of degrees of freedom to the number of clusters. E.g. if you have 5 clusters you can only include 4 variables, otherwise your SE may not be valid
- One sign that you may have too few clusters is that Stata refuses to calculate the F test statistic