System variables & subscripts
Subscripts
sysuse census.dta, clear
* Number of observations
gen state_number = _n // _n indicates the number of the current obs
sort pop // sort obs in ascending order according to pop size
gen state_number_new = _n // result depends on order of obs
br state_number*
/*
--> If you use this to create IDs, make sure you uniquely sort the observations
directly beforehand! Otherwise, you might accidentially refer to the wrong observations,
e.g. when replacing values or matching.
*/
gen number_of_states = _N // _N indicates the number of last obs --> amount of observations
br state_number* number_of_states
* By groups
sort region pop
by region: gen ranking=_n
order region state pop ranking
br
* Subscripts
sort pop
gen lowest_pop = pop[1] // the brackets refer to the # of the observation
gen highest_pop = pop[_N] // value of "pop" of the last observation
gen diff_pop_with_highest = highest_pop-pop
* In one line
gen diff_pop_with_highest2 = pop[_N]-pop
gen diff_pop_with_lowest = pop-pop[1]
* Moving reference
gen nextsmaller_pop = pop[_n-1]
gen diff_neigbour = pop-nextsmaller_pop
* In one line
gen diff_neigbour2 = pop-pop[_n-1]
Exercise
Load the pre-installed dataset auto.
- Sort the observations by “foreign” and “price”. Use the “by” prefix to create two new variables which contain the lowest and highest price for each value of “foreign”.
- Create a variable which indicates the difference between the highest and the lowest price for each value of “foreign”. Is there a way to generate this variable without using the variables generated in (1)?
- Sort the observations by the variable “make”. Generate a new variable which gives the number of each observation. Can this new variable be used as unique identifier? Hint: Run the codebook command on the variable “make” or use the command “isid”.