eCite Digital Repository

Comparison of pre-processing methodologies for Illumina 450k methylation array data in familial analyses


Cazaly, E and Thomson, R and Marthick, JR and Holloway, AF and Charlesworth, J and Dickinson, JL, Comparison of pre-processing methodologies for Illumina 450k methylation array data in familial analyses, Clinical Epigenetics, 8, (1) Article 75. ISSN 1868-7083 (2016) [Refereed Article]


Copyright Statement

Copyright 2016 The Authors. Licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)

DOI: doi:10.1186/s13148-016-0241-2


BACKGROUND: Human methylome mapping in health and disease states has largely relied on Illumina Human Methylation 450k array (450k array) technology. Accompanying this has been the necessary evolution of analysis pipelines to facilitate data processing. The majority of these pipelines, however, cater for experimental designs where matched 'controls' or 'normal' samples are available. Experimental designs where no appropriate 'reference' exists remain challenging. Herein, we use data generated from our study of the inheritance of methylome profiles in families to evaluate the performance of eight normalisation pre-processing methods. Fifty individual samples representing four families were interrogated on five 450k array BeadChips. Eight normalisation methods were tested using qualitative and quantitative metrics, to assess efficacy and suitability.

RESULTS: Stratified quantile normalisation combined with ComBat were consistently found to be the most appropriate when assessed using density, MDS and cluster plots. This was supported quantitatively by ANOVA on the first principal component where the effect of batch dropped from p < 0.01 to p = 0.97 after stratified QN and ComBat. Median absolute differences between replicated samples were the lowest after stratified QN and ComBat as were the standard error measures on known imprinted regions. Biological information was preserved after normalisation as indicated by the maintenance of a significant association between a known mQTL and methylation (p = 1.05e-05).

CONCLUSIONS: A strategy combining stratified QN with ComBat is appropriate for use in the analyses when no reference sample is available but preservation of biological variation is paramount. There is great potential for use of 450k array data to further our understanding of the methylome in a variety of similar settings. Such advances will be reliant on the determination of appropriate methodologies for processing these data such as established here.

Item Details

Item Type:Refereed Article
Keywords:450k, Array, Familial data, Methylation, Normalisation, Pre-processing pipeline
Research Division:Biological Sciences
Research Group:Genetics
Research Field:Epigenetics (incl. genome methylation and epigenomics)
Objective Division:Expanding Knowledge
Objective Group:Expanding knowledge
Objective Field:Expanding knowledge in the biological sciences
UTAS Author:Cazaly, E (Ms Emma Cazaly)
UTAS Author:Thomson, R (Dr Russell Thomson)
UTAS Author:Marthick, JR (Mr James Marthick)
UTAS Author:Holloway, AF (Professor Adele Holloway)
UTAS Author:Charlesworth, J (Dr Jac Charlesworth)
UTAS Author:Dickinson, JL (Professor Joanne Dickinson)
ID Code:110224
Year Published:2016
Web of Science® Times Cited:6
Deposited By:Menzies Institute for Medical Research
Deposited On:2016-07-20
Last Modified:2022-08-25
Downloads:213 View Download Statistics

Repository Staff Only: item control page