egen functions
egen functions
sysuse
census.dta, clear
/*
In contrast to generate, egen can only be used with specific functions - and these
functions can only be used with egen.
*/
* Some functions serve the same purpose as using generate
gen
pop_til_17 = poplt5+pop5_17
egen
pop_til_17_alt = rowtotal(poplt5 pop5_17)
egen
pop_all = rowtotal(pop*) // advantage of egen: can use varlists
br
pop_til_17 pop_til_17_alt pop_all
* Others look the same but do different things
gen
total_pop = sum(pop) // creates the running sum, not constant over observations!
egen
total_pop1 = total(pop) // total sum
egen
total_pop2 = sum(pop) // also creates total sum
br
total_pop*
* Bysort and egen
egen
total_pop_region_1 = total(pop), by(region) // total sum by region
by
region:
egen
total_pop_region_2 = total(pop) // not sorted
bysort
region:
egen
total_pop_region_2 = total(pop) // total sum by region, sort
br
region total_pop_region_1 total_pop_region_2
* Combine egen functions
replace
popurban = . if pop < 500000 // create missings for the example
egen
miss_pop = rowmiss(pop*) // counts missing pop* variables for each obs.
bysort
region:
egen
miss_pop_region = total(miss_pop) // sums number of missing pop* variables by region
help egen
Exercise
Load the pre-installed dataset auto.
- Create a new variable which contains the lowest price of all cars in the data set using the egen command.
- Create a new variable which contains the difference between a car’s price and the lowest price using the generate command. What is the mean difference between a car’s price and the lowest price?
- Create a new variable which contains the number of non-missing observations for “rep78” by car type (“foreign”). How many observations are non-missing for domestic cars? How many for foreign cars?