File(s) under permanent embargo
Experimental investigation of three machine learning algorithms for ITS dataset
conference contribution
posted on 2023-05-23, 04:50 authored by Yearwood, JL, Byeong KangByeong Kang, Kelarev, AThe present article is devoted to experimental investigation of the performance of three machine learning algorithms for ITS dataset in their abil- ity to achieve agreement with classes published in the biological literature be- fore. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form a Minkowski metric and the sequences cannot be re- garded as points in a finite dimensional space. This is why it is necessary to de- velop novel machine learning approaches to the analysis of datasets of this sort. This paper introduces a k-committees classifier and compares it with the dis- crete k-means and Nearest Neighbour classifiers. It turns out that all three machine learning algorithms are e␣cient and can be used to automate future biologically significant classifications for datasets of this kind. A simplified ver- sion of a synthetic dataset, where the k-committees classifier outperforms k-means and Nearest Neighbour classifiers, is also presented.
History
Publication title
Proceedings of Future Generation Information TechnologyEditors
Lee YH, Kim TH, Fang WC, Slezak DPagination
308-316ISBN
978-3-642-10508-1Department/School
School of Information and Communication TechnologyPublisher
Springer-VerlagPlace of publication
New York, USAEvent title
Future Generation Information TechnologyEvent Venue
Jeju Island, KoreaDate of Event (Start Date)
2009-12-10Date of Event (End Date)
2009-12-12Rights statement
The original publication is available at http://www.springerlink.comRepository Status
- Restricted