Segmentation of the Poisson and negative binomial rate models:a penalized estimator

Alice Cleynen; Emilie Lebarbier

doi:10.1051/ps/2014005

Segmentation of the Poisson and negative binomial rate models:a penalized estimator

Published online by Cambridge University Press: 22 October 2014

Alice Cleynen and

Emilie Lebarbier

Show author details

Alice Cleynen: Affiliation:
AgroParisTech UMR518, Paris 5e, France. alice.cleynen@agroparistech.fr
Emilie Lebarbier: Affiliation:
INRA UMR518, Paris 5e, France; emilie.lebarbier@agroparistech.fr

Article contents

Abstract
References

Get access

Abstract

We consider the segmentation problem of Poisson and negative binomial (i.e.overdispersed Poisson) rate distributions. In segmentation, an important issueremains the choice of the number of segments. To this end, we propose a penalized-likelihood estimator where the penaltyfunction is constructed in a non-asymptotic context following the works of L. Birgé and P.Massart. The resulting estimator is proved to satisfy an oracle inequality. Theperformances of our criterion is assessed using simulated and real datasets in the RNA-seqdata analysis context.

Keywords

Distribution estimation change-point detection count data (RNA-seq)poisson and negative binomial distributions model selection

Type: Research Article
Information: ESAIM: Probability and Statistics , Volume 18 , 2014 , pp. 750 - 769

DOI: https://doi.org/10.1051/ps/2014005 [Opens in a new window]
Copyright: © EDP Sciences, SMAI 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

H. Akaike, Information Theory and Extension of the Maximum Likelihood Principle. Second int. Symp. Inf. Theory (1973) 267–281.

Akakpo, N., Estimating a discrete distribution via histogram selection. ESAIM: PS 15 (2011) 1–29. Google Scholar

Arlot, S. and Massart, P., Data-driven calibration of penalties for least-squares regression. J. Mach. Learn. Res. 10 (2009) 245–279. Google Scholar

Baraud, Y. and Birgé, L., Estimating the intensity of a random measure by histogram type estimators. Probab. Theory Relat. Fields (2009) 143 239–284. Google Scholar

Barron, A., Birgé, L. and Massart, P., Risk bounds for model selection via penalization. Probab. Theory Relat. Fields 113 (1999) 301–413. Google Scholar

Biernacki, C., Celeux, G., Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (2000) 719–725. Google Scholar

L. Birgé, Model selection for Poisson processes. In Asymptotics: particles, processes and inverse problems, Vol. 55 of IMS Lect. Notes Monogr. Ser.. Beachwood, OH: Inst. Math. Statist. (2007) 32–64.

L. Birgé and P. Massart, From model selection to adaptive estimation, in Festschrift for Lucien Le Cam. New York, Springer (1997) 55–87.

Birgé, L. and Massart, P., Gaussian model selection. J. Eur. Math. Soc. 3 (2001) 203–268. Google Scholar

Birgé, L. and Massart, P., Minimal penalties for Gaussian model selection. Probab. Theory Relat. Fields (2007) 138 33–73. Google Scholar

Braun, J.V., Braun, R. and Müller, H.G., Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation. Biometrika 87 (2000) 301–314. Google Scholar

J.V. Braun, H.G. Muller, Statistical methods for DNA sequence segmentation. Stat. Sci. (1998) 142–162.

Breiman, Friedman, Olshen, Stone: Classification and Regression Trees. Wadsworth and Brooks (1984).

G. Castellan, Modified Akaikes criterion for histogram density estimation. Technical Report#9961 (1999).

A. Cleynen, M. Koskas, E. Lebarbier, G. Rigaill and S. Robin, Segmentor3IsBack, an R package for the fast and exact segmentation of Seq-data. Algorithms for Molecular Biology (2014)

N. Johnson, A. Kemp and S. Kotz, Univariate Discrete Distributions. John Wiley & Sons, Inc. (2005).

R. Killick and I.A. Eckley, Changepoint: an R package for changepoint analysis. Lancaster University (2011).

Lebarbier, E., Detecting multiple change-points in the mean of Gaussian process by model selection. Signal Process. 85 (2005) 717–736. Google Scholar

T.M. Luong, Y. Rozenholc and G. Nuel, Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model. Comput. Stat. Data Anal. (2013).

P. Massart, Concentration inequalities and model selection. In Lect. Notes Math. Springer Berlin/Heidelberg (2007).

Reynaud-Bouret, P., Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probab. Theory Relat. Fields 126 (2003) 103–153. Google Scholar

G. Rigaill, Pruned dynamic programming for optimal multiple change-point detection. ArXiv:1004.0887 2010, [http://arxiv.org/abs/1004.0887].

Rigaill, G., Lebarbier, E. and Robin, S., Exact posterior distributions and model selection criteria for multiple change-point detection problems. Stat. Comput. 22 (2012) 917–929. Google Scholar

Risso, D., Schwartz, K., Sherlock, G. and Dudoit, S., GC-Content Normalization for RNA-Seq Data. BMC Bioinform. 12 (2011) 480. Google Scholar PubMed

Yao, Y.C., Estimating the number of change-points via Schwarz’ criterion. Stat. Probab. Lett. 6 (1988) 181–189. Google Scholar

Zhang, N.R. and Siegmund, D.O., A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics 63 (2007) 22–32. Google Scholar

Article contents

Segmentation of the Poisson and negative binomial rate models:a penalized estimator

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests