Random variables

Note

If you do not work with random variables or analyses based on random numbers in general, you might skip this subchapter.

Random variables

// You can also generate variables based on random numbers. Functions are listed here:
help random

// Example
clear 
set obs 500000
// Creates a dataset with 500000 empty observations

// Standard normal distribution
gen standard_normal=rnormal()
hist standard_normal

// Normal distribution with mean==10 and sd==0.5
gen normal=rnormal(10,0.5)
hist normal

// Binomial distribution with 20 trials and success probability 0.5
gen binomial=rbinomial(20,0.5)
hist binomial 

/*
If you create random variables, or use commands which are based on random draws, 
Stata uses an algorithm to draw the random numbers. The result of the draw depends
on the start value for the algorithm. If you run the same command again, you will
get different results. To make sure that you can replicate your code, set a seed 
before the random function or command.
*/

// Let's create again the variable from the standard normal distribution
gen standard_normal2=rnormal()
br standard_normal*
// Both contain different values

set seed 31459176
gen standard_normal3 = rnormal()
// Again, different values, but now we can replicate them

set seed 31459176
gen standard_normal4 = rnormal()
br standard_normal*
// standard_normal3 and standard_normal4 are identical!

gen standard_normal5 = rnormal()
br standard_normal*
// standard_normal5 is again different - you need to set the seed directly before the variable to replicate it
			
			

Exercise

Remark: Check the help-file “random” for helpful functions.

Clear your working space. Set the number of observations to 10,000. Make sure you can replicate the following steps later on.

  1. Generate a variable called “dice” based on the uniform distribution, containing integers between 1 and 6. Inspect the distribution of the variable. How many times was the number 6 drawn?
  2. Generate a variable called “x1” based on the normal distribution, with a mean of 150 and a standard deviation of 25. Generate a variable called “x2” based on the normal distribution, with a mean of 125 and a standard deviation of 30. Inspect the summary statistics and the distribution of the variables.
  3. Generate a variable called “id” based on the uniform distribution. Use the command “isid” to check whether the variable uniquely identifies the observations. If this is not the case, what could you change to create a unique identifier?