Column

Column

The process

  • We attempted to create a reproducible work flow using R, RMarkdown, and GitHub
  • SAS output incorporated with .png (SAS can be run from RMarkdown but we didn’t do this)
  • A website with all (currently untidy) code is hosted on GitHub pages here https://bit.ly/2Nc11GR
  • The full GitHub repo is here https://github.com/CBDRH/ipdln_hackathon_2018

The Model

We fitted the following logistic regression model to generate the adjusted mortality rate:

R Code

expglm <- speedglm(dead ~ 
                     age_grp + sex + loinc_decile + rur_urb +
                     province + education + employ_status +
                     ab_id_dichot + mar_stat + generation,
                   family = binomial(), data = clean_data)`

SAS code

model dead(ref='Not Dead')= age_grp   sex loinc_decile rur_urb province  education  employ_status ab_id_dichot mar_stat  generation   ;

Province

Age groups

Sex

Rural/Urban

Education

Employment status

Migrant generation

Income decile

Marital Status

Suggestions

If using synthpop to create synthetic data:

  • document the generation process (show us the code!)
    • the order of column generation matters!
    • interactions are only synthesised for columns to the left of each column
  • provide replicates — multiple versions of the synthetic dataset
    • stochastic error in the mean of the synthetic estimates can thus be reduced