Notes on weights in descriptive statistics
Note that in some context data may be provided with weights
-
Stata knows frequency weights (fweight), sampling weights (pweight), analytical weights (aweight) and "importance weights" (iweight)
- fweight: denote the frequency of observations, e.g. if an observation has an fweight of 100, it means that Stata will pretend that there are 99 other identical observations
- pweight: denote the inverse probability of an observation to be selected into the sample. Used to make correct surveys for overand undersampling of certain groups
- aweight: typically used in combination with summary statistics, e.g. it tells us how many observations were used to calculate a mean
- iweight: arbitrary weights used for programming purposes
- The most important weights for creating summary statistics are fweight and pweight.
Example:
Download the dataset fweights_example.dta below and open it in stata. And try out the commands below with and without weight and notice the differences.
The dataset contains an artificial list of the number kids per households. Households with the same number of kids are grouped. We now need summary statistics. Without weights you would have to apply extensive manipulations to the dataset.
Instead we can do the following:
sum kids [fweight=frequency]
tabstat kids [fweight=frequency], stat(count mean sd min max)
tab kids [fweight=frequency]
table kids [fweight=frequency]
tabstat kids [fweight = frequency], by(kids) stat(count)