System variables & subscripts

Subscripts


			sysuse census.dta, clear

* Number of observations
gen state_number = _n 		// _n indicates the number of the current obs 
sort pop 			// sort obs in ascending order according to pop size
gen state_number_new = _n 	// result depends on order of obs 
br state_number*
	/*
	--> If you use this to create IDs, make sure you uniquely sort the observations 
	directly beforehand! Otherwise, you might accidentially refer to the wrong observations,
	e.g. when replacing values or matching.
	*/
gen number_of_states = _N 	// _N indicates the number of last obs --> amount of observations
br state_number* number_of_states

* By groups
sort region pop 
by region: gen ranking=_n
order region state pop ranking 
br

* Subscripts 
sort pop
gen lowest_pop = pop[1]		// the brackets refer to the # of the observation
gen highest_pop = pop[_N]	// value of "pop" of the last observation
gen diff_pop_with_highest = highest_pop-pop

* In one line 
gen diff_pop_with_highest2 = pop[_N]-pop
gen diff_pop_with_lowest = pop-pop[1]

* Moving reference
gen nextsmaller_pop = pop[_n-1]
gen diff_neigbour = pop-nextsmaller_pop

* In one line 
gen diff_neigbour2 = pop-pop[_n-1]

Exercise

Load the pre-installed dataset auto.

Sort the observations by “foreign” and “price”. Use the “by” prefix to create two new variables which contain the lowest and highest price for each value of “foreign”.
Create a variable which indicates the difference between the highest and the lowest price for each value of “foreign”. Is there a way to generate this variable without using the variables generated in (1)?
Sort the observations by the variable “make”. Generate a new variable which gives the number of each observation. Can this new variable be used as unique identifier? Hint: Run the codebook command on the variable “make” or use the command “isid”.

Solution

Exercise_System variables and subscripts.do