Professional Documents
Culture Documents
Research dataset 1 (dataset1.csv) contains 50,000 electronic health records with the following
variables:
Variable Description
id ID
sm_status Smoking status
sex Sex
age Age at baseline
cancer Prevalent cancer at baseline
dementia Prevalent dementia at baseline
diuretics Use of diuretics at baseline
bmi Body mass index (in kg/m2) - measured at baseline
died Died during follow up
date_baseline Date at baseline
date_end_fu End of follow-up date
Dataset description 2
Research dataset 2 (dataset2.csv) contains 10,000 electronic health records of smokers with the
following variables:
Variable Description
id ID
sex Sex
age Age at baseline
education level education: values 1 (low) to 5 (high)
n_cigarettes N of cigarettes smoking (per day)
CVD Prevalent CVD at baseline
dementia Prevalent dementia at baseline
diuretics Use of diuretics at baseline
bmi Body mass index (in kg/m2) - measured at baseline
smoking_cessation Quit smoking between baseline and the end of follow-up
bmi_ch_percent BMI change (%)