eCite Digital Repository

Predicting deseasonalised serum 25 hydroxy vitamin D concentrations in the D-Health Trial: An analysis using boosted regression trees


Waterhouse, M and Baxter, C and Duarte Romero, B and McLeod, DSA and English, DR and Armstrong, BK and Clarke, MW and Ebeling, PR and Hartel, G and Kimlin, MG and O'Connell, RL and Pham, H and Rodney Harris, RM and van der Pols, JC and Venn, AJ and Webb, PM and Whiteman, DC and Neale, RE, Predicting deseasonalised serum 25 hydroxy vitamin D concentrations in the D-Health Trial: An analysis using boosted regression trees, Contemporary Clinical Trials, 104 pp. 1-11. ISSN 1551-7144 (2021) [Refereed Article]

Copyright Statement

2021 Elsevier

DOI: doi:10.1016/j.cct.2021.106347


Background: The D-Health Trial aims to determine whether monthly high-dose vitamin D supplementation can reduce the mortality rate and prevent cancer. We did not have adequate statistical power for subgroup analyses, so could not justify the high cost of collecting blood samples at baseline. To enable future exploratory analyses stratified by baseline vitamin D status, we developed models to predict baseline serum 25 hydroxy vitamin D [25(OH)D] concentration.

Methods: We used data and serum 25(OH)D concentrations from participants who gave a blood sample during the trial for compliance monitoring and were randomised to placebo. Data were partitioned into training (80%) and validation (20%) datasets. Deseasonalised serum 25(OH)D concentrations were dichotomised using cut-points of 50, 60 and 75 nmol/L. We fitted boosted regression tree models, based on 13 predictors, and evaluated model performance using the validation data.

Results: The training and validation datasets had 1788 (10.5% <50 nmol/L, 23.1% <60 nmol, 48.8 <75 nmol/L) and 447 (11.9% <50 nmol/L, 25.7% <60 nmol/L, and 49.2% <75 nmol/L) samples, respectively. Ambient UV radiation and total intake of vitamin D were the strongest predictors of 'low' serum 25(OH)D concentration. The area under the receiver operating characteristic curves were 0.71, 0.70, and 0.66 for cut-points of <50, <60 and <75 nmol/L respectively.

Conclusions: We exploited compliance monitoring data to develop models to predict serum 25(OH)D concentration for D-Health participants at baseline. This approach may prove useful in other trial settings where there is an obstacle to exhaustive data collection.

Item Details

Item Type:Refereed Article
Research Division:Health Sciences
Research Group:Epidemiology
Research Field:Epidemiological methods
Objective Division:Health
Objective Group:Clinical health
Objective Field:Clinical health not elsewhere classified
UTAS Author:Venn, AJ (Professor Alison Venn)
ID Code:143478
Year Published:2021
Web of Science® Times Cited:5
Deposited By:Menzies Institute for Medical Research
Deposited On:2021-03-19
Last Modified:2021-04-26

Repository Staff Only: item control page