eCite Digital Repository

Random Forests machine learning applied to gas chromatography mass spectrometry derived average mass spectrum data sets for classification and characterisation of essential oils

Citation

Lebanov, L and Tedone, L and Ghiasvand, A and Paull, B, Random Forests machine learning applied to gas chromatography - mass spectrometry derived average mass spectrum data sets for classification and characterisation of essential oils, Talanta, 208 Article 120471. ISSN 0039-9140 (2019) [Refereed Article]


Preview
PDF
Pending copyright assessment - Request a copy
6Mb
  

DOI: doi:10.1016/j.talanta.2019.120471

Abstract

Differences in chemical profiles of various essential oils (EOs) come from the fact that each plant species and chemotype has a distinctive secondary metabolism. Therefore, these differences can be used as the chemical markers for EO classification and determination of their quality. Herein, the Random Forests (RF) machine learning algorithm was applied to the classification of 20 different EOs. From three-way raw gas chromatography - mass spectra data, total chromatogram average mass spectra (TCAMS) and segment average mass spectra (SAMS) were created. TCAMS was generated by averaging response of each m/z over the whole chromatogram and SAMS by averaging the response of each fragment across a certain time segment within the chromatogram. The RF model was applied to the two data sets and optimised through the evaluation of pre-processed data, number of trees, and number of variables used in each node split. The performance of the model was evaluated through a cross-validation process, repeated 50 times by dividing the whole sample set into training and validation subsets. The calculated average out-of-bag error (OOBE), over 50 different training TCAMS data sets was 3.22  1.29%, while for SAMS it was found to be 2.28  1.33%. The minimal number of variables necessary for EO classification was determined by a nested cross-validation process. The amount of reduced variables in each step was 10%. It was shown that the TCAMS data set with 6 variables had similar prediction power as the SAMS with 30 variables. OOBE for classification of 20 EOs was 2.89  1.44% and 3.70  1.73%, for TCAMS and SAMS, respectively. Proximity between samples was used to evaluate their qualities. Samples with greater intra-class proximity had good similarity, while the lower ones indicated greater variations in the chemical profiles. The SAMS data set showed superior potential for quality assurance, compared with TCAMS.

Item Details

Item Type:Refereed Article
Keywords:essential oil, random forests classification, average mass spectrum, quality assurance, gas chromatography, mass spectrometry
Research Division:Chemical Sciences
Research Group:Analytical Chemistry
Research Field:Analytical Spectrometry
Objective Division:Expanding Knowledge
Objective Group:Expanding Knowledge
Objective Field:Expanding Knowledge in the Chemical Sciences
UTAS Author:Lebanov, L (Mr Leo Lebanov)
UTAS Author:Tedone, L (Ms Laura Tedone)
UTAS Author:Ghiasvand, A (Professor Alireza Ghiasvand)
UTAS Author:Paull, B (Professor Brett Paull)
ID Code:137014
Year Published:2019
Deposited By:Chemistry
Deposited On:2020-01-28
Last Modified:2020-01-29
Downloads:0

Repository Staff Only: item control page