Reshape data
Most of the time, your data varies across two (or more) dimensions, e.g. individuals and years. There are two ways to represent this data:
reshape
You might need to reshape data if you need different forms for different analyses, or you want to combine (“merge”) datasets in a specific way.
Reshape data
***************
*** Reshape ***
/*
"reshape" changes data from the long to the wide data and vice versa.
*/
sysuse nlswide1, clear
browse
/*
You can see that you have a panel with one row for each type of occupation, and
several columns for the different years: For example, age in 1968 is captured in
variable age68, age in 1988 is captured in variable age88 and so on.
Hence, the data is currently in the wide form. That's useful for example
for some bar graphs:
*/
graph bar wage68 wage88, over(occ)
/*
However, what if we would like to see the wage trend over years? This would not
work easily, as we do not have a single wage variable, and we also do not have
a year variable. So, let's reshape it to the long form. For this, consider:
What are the two dimensions of the data?
Occupation and year
Which variable(s) uniquely identify the observations?
occ --> i(occ)
Which variable do we need to create?
year (or whatever you like to name it) --> j(year)
Which variables contain unique occupation-year observations?
count* collgrad* age* c_city* union* ttl_exp* tenure* hours* wage*
Hence, the command should look as follows:
*/
reshape long count collgrad age c_city union ttl_exp tenure hours wage, i(occ) j(year)
/*
Note that count collgrad etc. are the characters of the variables that the different
year-variables have in common. The help-file calls these "stub". Make sure that the
stubs really identify the variables (and only the variables meant for reshape).
*/
browse
// Now, we have two rows for each occupation, or one row for each occupation-year
// combination. Let's now do our graph:
twoway line wage year, by(occ)
// This works out now. However, doing a bar graph looks different now (as does the
// resulting graph):
graph bar wage, over(year) over(occ)
// If we want to switch back, we can write
reshape wide count collgrad age c_city union ttl_exp tenure hours wage, i(occ) j(year)
// If we already ran reshape, the short cut is
reshape long // respectively reshape wide
// You can also switch the dimensions, e.g.
reshape wide count collgrad age c_city union ttl_exp tenure hours wage, i(year) j(occ)
// Now we have one observation for each year, and a variable combination for each occupation
// For a detailed discussion of reshape, see the PDF entry (link at top of help-file)