Ordinal & categorical variables

  • Before you include your variables think about how you variables are scaled and what the coefficients will mean
  • If you have ordinal variables it is generally advisable to include each level as dummy, because otherwise you constrain each step to have the same marginal effect (e.g. the effect of completing primary school on income is likely to differ from the effect of completing secondary school)
  • Categorical variable must always be recoded as dummies because their levels do not have an ordinal interpretation
  • There are two fast ways of including a variable as dummies
    1. add the prefix i. to a variable
    2. add the prefix i. to a variable and write xi: before the command
  • The difference between the two is that 2. will also automatically save the dummies as variables
  • Note that both commands drop the first dummy
* Including dummy variables and categorical variables
webuse nhanes2, clear
// region is a categorical variable
tab region
// this would work but not make sense (factor levels are arbitrary):
reg bmi age ration

// instead use prefix i. to indicate categorical variable
// this implicitly creates dummies for each category 
reg bmi age i.region
	// check which region is left out - this is the reference category
	
// adding xi: before the command saves dummies as variables 
xi: reg bmi age i.region
	// now check list of variables
	// each category has one variable, except reference category

// variables that are already dummies do not need prefix i.
tab female
reg bmi age female
reg bmi age i.female