egen functions

egen functions

sysuse census.dta, clear

/*
In contrast to generate, egen can only be used with specific functions - and these
functions can only be used with egen.
*/

* Some functions serve the same purpose as using generate
gen pop_til_17 = poplt5+pop5_17
egen pop_til_17_alt = rowtotal(poplt5 pop5_17)
egen pop_all = rowtotal(pop*) // advantage of egen: can use varlists
br pop_til_17 pop_til_17_alt pop_all

* Others look the same but do different things
gen total_pop = sum(pop) // creates the running sum, not constant over observations!
egen total_pop1 = total(pop) // total sum
egen total_pop2 = sum(pop) // also creates total sum
br total_pop*

* Bysort and egen
egen total_pop_region_1 = total(pop), by(region) // total sum by region
by region: egen total_pop_region_2 = total(pop) // not sorted
bysort region: egen total_pop_region_2 = total(pop) // total sum by region, sort
br region total_pop_region_1 total_pop_region_2

* Combine egen functions
replace popurban = . if pop < 500000 // create missings for the example
egen miss_pop = rowmiss(pop*) // counts missing pop* variables for each obs.
bysort region: egen miss_pop_region = total(miss_pop) // sums number of missing pop* variables by region
help egen

Exercise

Load the pre-installed dataset auto.

  1. Create a new variable which contains the lowest price of all cars in the data set using the egen command.
  2. Create a new variable which contains the difference between a car’s price and the lowest price using the generate command. What is the mean difference between a car’s price and the lowest price?
  3. Create a new variable which contains the number of non-missing observations for “rep78” by car type (“foreign”). How many observations are non-missing for domestic cars? How many for foreign cars?