Virtual Population Data
Virtual Population Data
Section titled “Virtual Population Data”To get CSV files containing the Virtual Population datasets click on Generate Data in the Virtual Data section of the project. Once the data has been generated click on Download and save the resulting files.
What’s included in the Virtual Population Data
A separate file is generated for each simulation and for each scenario. For example, 500 simulations with 5 scenarios will generate a total of 2500 csv files. A master file, called “MANIFEST.csv” is also produced which holds the key to each of the generated files. For the example of 500 simulations with 5 scenarios the master file will contain 2500 rows where each row contains the filename for a unique simulation and for one scenario. This information can then be merged with the file containing the simulated data and used for analysis outside of KerusCloud. To enable this, a file containing variable metadata called “METADATA.json” has been provided which details variable parameters and any advanced options and estimand strategies
How is each simulation file composed?
Each data file contains N rows per Group * Recruitment Sites where N is the Max Cohort Size for the Virtual Population. The downloaded data files contain the variable values after Advanced Options and Estimands have been applied. It should be noted that where any Imputation or Estimand replacement strategies involve calculations on the observed data (e.g. replacement with the mean of a variable) then the replacement values in the downloaded data may differ from those calculated for different sample sizes and allocations, as a result of the calculations being evaluated on different subjects. This may need to be considered if custom analysis is performed on the downloaded data.