Readability II

Names

Where possible, your names should be descriptive and self-explanatory. This refers to

  • Variables
  • Macros
  • Files

There are many possible naming conventions. Agree with team which naming conventions make sense for your project. Also keep in mind that you can attach labels and notes to variables and datasets to explain them further.

  • Your data is based on a long questionnaire. Should variables be named after question number (q_35_2) or “title” (income_job_2)?
  • The first is easier to combine with the supplementary material (and unambiguous)
  • The latter is easier to memorize & recognize when coding

Labels and notes

Decide on meaningful variable labels and notes:

  • Variable labels are very easy to find → quick overview on variable content
  • But: You might not want all information in the label, as labels are used for outputs such as tables or graphs
  • Notes can provide more details
  • But: Not everyone knows them, such that they might not be noticed
  • Characteristics are a more advanced version of notes and can be very useful for categorizing variables & automation (more below)

Screenshot of indication of notes when using the describe command

Also, use meaningful value labels and check their consistency, e.g. with labelbook (also check out the option "problems) and uselabel. Use meaningful missing values where appropriate (e.g., .d for “don’t know”, .r for “refused” etc., see more below).

Structure of the dataset

Before storing a dataset:

  • Order important variables such as identifiers, country names, dates/year at the top
  • Check for meaningful unique identifier(s)
  • Choose a unique sorting of your data
  • Check your value labels (e.g. for unused labels) with labelbook, problems

Remember to provide further documentation material (e.g., questionnaires, metadata) outside of Stata.

Tips for value labels & missing values

Use the same value label for variables with the same definition, e.g.

label define     lblYesNo           0 „No“ 1 „Yes“
label value       question_1 question_2 question_3   lblYesNo
			

Two advantages:

  1. You only need to change the definition in one place
  2. You can refer to these variables with the ds command

Screenshot of the ds command and the return list

You can code missings with the mvdecode command.

For example, imagine the questionnaire defined “refused” as 88 and “don’t know” as 99. You want to code them as missing, but keep the information why it’s missing (→ special missing)

mvdecode question_4, mv(88 = .r \ 99 = .d)

Combine this with ds command:

Screenshot of the ds command, and the mvdecode command using the return list of the ds command to recode variables

This makes it easier to ensure that every variable is coded correctly.

More on characteristics

*** Excursion: Why characteristics are great for automation ***

sysuse auto, clear

// Let's attach characteristics to the variables
// Here, I decided to attach the unit to each variable
char mpg[unit]			Miles per gallon
char headroom[unit] 		Inches
char trunk[unit]		Cubic feet
char weight[unit] 		Pounds
char length[unit] 		Inches
char turn[unit]			Feet
char displacement[unit] 	Cubic inches

// Now, we can call the characteristic with an extended local function
local c: char mpg[unit]
display `"`c'"'

// We can use this when creating graphs
hist mpg, xti("`: char mpg[unit]'")

// Thus, we do not need the unit in the variable label anymore
la var mpg		"Mileage"

// Adding this to our graph
hist mpg, xti("`: char mpg[unit]'") ti("`: var la mpg'")

// Let's change the other variable labels as well
la var headroom		"Headroom"
la var trunk 		"Trunk space"
la var weight		"Weight"
la var length 		"Length"
la var turn 		"Turn circle"
la var displacement 	"Displacement"

// We can use the "ds" command to list all variables with a "unit" characteristics
ds, has(char unit)

// We can use this list to loop over all variables and to create graphs
local unit_vars 	`r(varlist)'
foreach var of varlist `unit_vars' {
	hist `var', ti("`: var la length'") xti("`: char length[unit]'") name(g_`var', replace)
}

/* 
Note that characteristics do not show up if you use "describe" or "codebook".
Instead, you can list the characteristics with "char list".
You can attach multiple characteristics to a variable.
*/