Hostname: page-component-cd9895bd7-jkksz Total loading time: 0 Render date: 2024-12-27T21:04:32.995Z Has data issue: false hasContentIssue false

A simple maximization model inspired by algorithms for the organization of genetic candidates in bacterial DNA

Published online by Cambridge University Press:  08 September 2016

Andrew G. Hart*
Affiliation:
Universidad de Chile
Servet Martínez*
Affiliation:
Universidad de Chile
Leonardo Videla*
Affiliation:
Universidad de Chile
*
Postal address: Departamento de Ingeniería Matemática and Centro de Modelamiento Matemático, UMR 2071 CNRS-UCHILE, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Casilla 170-3, Correo 3, Santiago, Chile.
Postal address: Departamento de Ingeniería Matemática and Centro de Modelamiento Matemático, UMR 2071 CNRS-UCHILE, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Casilla 170-3, Correo 3, Santiago, Chile.
Postal address: Departamento de Ingeniería Matemática and Centro de Modelamiento Matemático, UMR 2071 CNRS-UCHILE, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Casilla 170-3, Correo 3, Santiago, Chile.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We propose a simple model for interaction between gene candidates in the two strands of bacterial DNA (deoxyribonucleic acid). Our model assumes that ‘final’ genes appear in one of the two strands, that they do not overlap (in bacteria there is only a small percentage of overlap), and that the final genes maximize the occupancy rate, which is defined to be the proportion of the genome occupied by coding zones. We are more concerned with describing the organization and distribution of genes in bacterial DNA than with the very hard problem of identifying genes. To this end, an algorithm for selecting the final genes according to the previously outlined maximization criterion is proposed. We study the graphical and probabilistic properties of the model resulting from applying the maximization procedure to a Markovian representation of the genic and intergenic zones within the DNA strands, develop theoretical bounds on the occupancy rate (which, in our view, is a rather intractable quantity), and use the model to compute quantities of relevance to the Escherichia coli genome and compare these to annotation data. Although this work focuses on genomic modelling, we point out that the proposed model is not restricted to applications in this setting. It also serves to model other resource allocation problems.

Type
General Applied Probability
Copyright
Copyright © Applied Probability Trust 2006 

References

Burge, C. and Karlin, S. (1997). Prediction of complete gene structures in human genomic DNA. J. Molec. Biol. 268, 7894.Google Scholar
Kelly, F. P. (1991). Loss networks. Ann. Appl. Prob. 1, 319378.Google Scholar
Krengel, U. (1985). Ergodic Theorems (De Gruyter Stud. Math. 6). Walter De Gruyter, Berlin.CrossRefGoogle Scholar
Lukashin, A. V. and Borodovsky, M. (1998). GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res. 26, 11071115.CrossRefGoogle ScholarPubMed
Nicolas, P. (2003). Mise au point et utilisation de modèles de Markov cachées pour l'etude des séquences d'ADN. , Université d'Evry.Google Scholar
Nicolas, P. and Muri-Majoube, F. (2001). R'HOM. Programs to segment DNA sequences into homogeneous regions. Tech. Rep., Université d'Evry. Available at http://genome.jouy.inra.fr/ssb/rhom/rhom_doc/rhom_doc.html.Google Scholar
Salzberg, S. L., Delcher, A. L., Kasif, S. and White, O. (1998). Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 26, 544548.Google Scholar