eCite Digital Repository

In Silico Detection and Characterisation of Biological Regulatory Elements

Citation

Uren, PJ, In Silico Detection and Characterisation of Biological Regulatory Elements (2009) [PhD]

Abstract

This thesis concentrates upon the detection of biological transcriptional regulatory elements through computational methods. Current approaches are focused upon a representation of DNA which is essentially an abstraction to a string using a four letter alphabet. This fails to make explicit a large amount of relevant information describing how the molecule functions in its cellular environment. A major contribution of this work is the exploration of existing higher order physical and chemical properties as a representational scheme for DNA. Classification mechanisms based upon such a representation are evaluated on several tasks associated with the recognition and localisation of transcriptional control elements. The computational approaches used come from a variety of backgrounds, but the focus is primarily on machine learning methods. It is shown that promoters can be effectively predicted using a representation based on higher-order physical and chemical properties. This representation also allows more explicit insight into the biological functioning of the promoter by highlighting which regions are important for classification with respect to each model. This physico-chemical representation is also shown to be effective in clustering transcription factor binding sites for a single factor into sub-groups. These groups are used to construct weight matrices which demonstrate improved binding site classification over their original counterparts. The newly constructed composite matrices are also shown to produce fewer positive predictions but equivalent classification performance when used within a promoter prediction scheme. Motif based representations for characterising promoters are also prevalent. These have traditionally focused on a relatively small, often fixed, number of core promoter elements. While this is easily mapped into a supervised learning scenario, the more challenging task of using a variable number of motifs is considered within this work. An approach is presented to handle the scenario in which both the number of elements and their frequency of occurrence are not known a priori. This representation, handled via the multiple instance learning paradigm, is shown to be effective when combined with physico-chemical property based promoter prediction. Finally, comparative approaches also exist for the identification of regulatory elements and are often heavily reliant on a multiple sequence alignment algorithm. Such an algorithm, using simulated annealing to search for an optimal alignment ordering and based on a recent solution to the aligning alignments problem, is introduced within this work. This thesis explores the application of the new algorithm to problems involving both protein and nucleotide data.

Item Details

Item Type:PhD
Research Division:Information and Computing Sciences
Research Group:Artificial Intelligence and Image Processing
Research Field:Pattern Recognition and Data Mining
Objective Division:Expanding Knowledge
Objective Group:Expanding Knowledge
Objective Field:Expanding Knowledge in the Information and Computing Sciences
Author:Uren, PJ (Mr Philip Uren)
ID Code:60925
Year Published:2009
Deposited By:Information and Communication Technology
Deposited On:2010-02-22
Last Modified:2010-02-24
Downloads:0

Repository Staff Only: item control page