Inspecting data

Before working with a dataset, get an overview of the data. The following commands might be useful. Look up the respective help files to see all the options that you can specify and the video for an example of inspecting the lifeexp dataset.

Inspecting data

  • Browse: look at the whole dataset, specific variables, sub-sets of the dataset
  • Codebook: detailed overview of selected variable(s)
  • Summarize: for a first glance of the distribution of a continuous variable
  • Tabulate: for a first glance at the frequency table of a categorical variable, can also be used as a twoway table combining two categorical variables
  • Count: display the number of observations in total or which fulfil a certain condition
  • Basic graphs: graphic display of the distribution of a continuous variable in a histogram, the relationship of two variables to each other using scatter

Exercise

  1. Open the built-in dataset nlswork
  2. Inspect the dataset and find out what it is roughly about
  3. How many individuals are married?
  4. Among college graduates, how many are married?
  5. For how many individuals is the age variable missing?
  6. Check which races are recorded in the survey and their frequency.
  7. Compute the correlation between logarithmic wages and hours worked for college graduates
  8. Display this correlation graphically