eCite Digital Repository

Use of a novel non-parametric version of DEPTH to identify genomic regions associated with prostate cancer risk

Citation

MacInnis, RJ and Schmidt, DF and Makalic, E and Severi, G and FitzGerald, LM and Reumann, M and Kapuscinski, MK and Kowalczyk, A and Zhou, Z and Goudey, B and Qian, G and Bui, QM and Park, DJ and Freeman, A and Southey, MC and Amin Al Olama, A and Kote-Jarai, Z and Eeles, RA and Hopper, JL and Giles, GG, for the UK Genetic Prostate Cancer Study Collaborators, Use of a novel non-parametric version of DEPTH to identify genomic regions associated with prostate cancer risk, Cancer Epidemiology, Biomarkers and Prevention, 25, (12) pp. 1619-1624. ISSN 1055-9965 (2016) [Refereed Article]

Copyright Statement

Copyright 2016 American Association for Cancer Research

DOI: doi:10.1158/1055-9965.EPI-16-0301

Abstract

BACKGROUND: We have developed a GWAS analysis method called DEPTH (DEPendency of association on the number of Top Hits) to identify genomic regions potentially associated with disease by considering overlapping groups of contiguous markers (e.g. single nucleotide polymorphisms, SNPs) across the genome. DEPTH is a machine learning algorithm for feature ranking of ultra-high dimensional datasets, built from well-established statistical tools such as bootstrapping, penalised regression and decision trees. Unlike marginal regression, which considers each SNP individually, the key idea behind DEPTH is to rank groups of SNPs in terms of their joint strength of association with the outcome. Our aim was to compare the performance of DEPTH with that of standard logistic regression analysis.

METHODS: We selected 1,854 prostate cancer cases and 1,894 controls from the UK for whom 541,129 SNPs were measured using the Illumina Infinium HumanHap550 array. Confirmation was sought using 4,152 cases and 2,874 controls, ascertained from the UK and Australia, for whom 211,155 SNPs were measured using the iCOGS Illumina Infinium array.

RESULTS: From the DEPTH analysis we identified 14 regions associated with prostate cancer risk that had been reported previously; five of which would not have been identified by conventional logistic regression. We also identified 112 novel putative susceptibility regions.

CONCLUSIONS: DEPTH can reveal new risk-associated regions that would not have been identified using a conventional logistic regression analysis of individual SNPs.

IMPACT: This study demonstrates that the DEPTH algorithm could identify additional genetic susceptibility regions that merit further investigation.

Item Details

Item Type:Refereed Article
Keywords:genome-wide association studies, machine learning algorithm, decision trees, single nucleotide polymorphism, prostate cancer
Research Division:Biological Sciences
Research Group:Genetics
Research Field:Epigenetics (incl. Genome Methylation and Epigenomics)
Objective Division:Health
Objective Group:Clinical Health (Organs, Diseases and Abnormal Conditions)
Objective Field:Cancer and Related Disorders
Author:FitzGerald, LM (Dr Liesel Fitzgerald)
ID Code:111141
Year Published:2016
Deposited By:Menzies Institute for Medical Research
Deposited On:2016-08-31
Last Modified:2017-11-02
Downloads:0

Repository Staff Only: item control page