Macro functions
Reading directories can be even more convenient with macro functions. This can be used to loop over files, e.g. to load and combine data, or to do the same cleaning steps for all files in a certain folder (or with a certain name structure).
In the first example, we have a folder with datasets from different regions we would like to combine. In the second example, we have a folder with datasets with different variables. You can apply the same logic to automatically create codebooks for all .dta files in a given directory (see extension below).
** Example 1: Automatically load & append files
// Create a list with all files ending with ".dta" in the folder "data_1"
local files: dir "data_1" files "*.dta" // in Windows, add option "respectcase"
// Extract the first file name in local "first", put all other names in local "rest"
gettoken first rest: files
// Load the first file
use "data_1/`first'", clear
// Append the other files using a loop
foreach f of local rest {
append using "data_1/`f'"
}
save "data_all_regions.dta", clear
** Example 2: Automatically load & merge files
// Create a list with all files ending with ".dta" in the folder "data_1"
local files: dir "data_2" files "*.dta"
// Extract the first file name in local "first", put all other names in local "rest"
gettoken first rest: files
// Load the first file
use "data_2/`first'", clear
// Merge the other files using a loop (drop _merge each time/specify "nogen")
foreach f of local rest {
merge 1:1 state using "data_2/`f'", assert(3) nogen
}
save "data_all_variables.dta", replace
** A convenient extension: Automatically generate a codebook from all files
local files: dir "data_2" files "*.dta"
gettoken first rest: files
local n = 1
foreach f of local files {
use "data_2/`f'", clear
describe, replace // creates a new dataset with variable names etc.
gen dataset = "`f'" // to signify from which dataset the variables come
tempfile file_`n' // temporary files, as not needed later on
save `file_`n''
local ++n
}
local n = `n'-1 // by the end of the previous loop, "n" contained the number of files +1
use `file_1', clear // load the "codebook" for the first file
forv m = 2/`n' {
append using `file_`m'' // append all other codebooks (attention with local quotes!)
}
compress
drop position format
la var dataset "data set"
export excel using "Codebook.xlsx", first(varl) replace