Collapse & contract

Often, the observations of your data belong to a larger group, e.g., you have observations on the state-level, and states belong to regions. If you would like to do some analyses on the “higher” level (or aggregate), you can use collapse. This creates a dataset aggregated by certain statistics (e.g. mean, sum, max…).

collapse

Collapse & contract data

***************************
*** collapse & contract ***


/*
"collapse" aggregates data as means, sum, etc.
*/


sysuse census, clear 

// let's aggregate all population variables into means by region
collapse pop*, by(region)   // mean is the default
br                          // the original data is lost

sysuse census, clear 

// let's aggregate all population variables into maximum values by region
collapse (max) pop*, by(region)
br

sysuse census, clear 

// if you need different ways of aggregation, you can create new variables
collapse (count) state_num = pop (mean) pop* (max) pop_max=pop, by(region)
br

// similar command: contract (converts data into percentages & frequencies)
sysuse census, clear 
contract state
br         // each state occurs only once

sysuse census, clear 
contract region
br         // the different regions occur several times