Variable Types

Binary variables

sysuse census.dta, clear 

gen urban_share = popurban/pop

* Intuitive way to create indicator/binary/dummy variables
gen urban_majority = 1 if urban_share>0.5
replace urban_majority = 0 if urban_share<=0.50
tab urban_majority

* Shortcut
gen urban_majority_shortcut = urban_share>0.5
tab urban_majority urban_majority_shortcut

	* Common mistake: Missings are treated as infinitively large
	
	// generate missings for pedagogic purposes 
	replace urban_share = . if region==3
	
	// You want to create a variable indicating a majority of urban population
	gen urban_majority_wrong = urban_share>0.5			
	tab urban_majority_wrong
	tab region urban_majority_wrong
	/*
	The variable should be missing for region 3 (South), but is coded as 1.
	Stata treats missings as infinitively large, hence "urban_share>0.5" is true
	for those with a urban share above 50%, and is true for those with missing 
	values for urban share! To prevent this, use an if-expression or replace:
	*/
	gen urban_majority_right = (urban_share>0.5) if urban_share<.		
	// --> results in 0's, 1's and missings
	// alternatively to "if urban_share<.", you can use "if !missing(urban_share)"
	// Check results 
	tab region urban_majority_right

* Turn categories into indicator/binary/dummy variables with tabulate 
tab region, generate(region_dummy)

* Other useful functions
gen state_list = inlist(state,"Alabama","Oklahoma")		// abbrev. for (state=="Alabama" | state=="Oklahoma")
br state state_list
gen pop_range = inrange(pop,1e+6,2e+6)				// abbrev. for (pop>=1e+6 & pop<=2e+6)
br pop pop_range
gen pop_high = cond(pop>10e+6,1,0,.)				// very flexible function
// could specify any if-then outcome, e.g. cond(pop>10e+6,10,-10,.)
br pop pop_high

Exercise

Load the pre-installed dataset auto.

  1. Generate a variable which indicates whether a car had a repair record above three. For how many cars was this the case?
  2. Summarize the variable “price” in detail. Generate a variable which indicates whether the car’s price is above the median (50th percentile). Check your results.
  3. Generate a variable which indicates whether a car costs between 4,000 and 6,000. How many cars a within this price range?