Hostname: page-component-78c5997874-j824f Total loading time: 0 Render date: 2024-11-10T15:56:58.582Z Has data issue: false hasContentIssue false

Segmentation of the Poisson and negative binomial rate models:a penalized estimator

Published online by Cambridge University Press:  22 October 2014

Alice Cleynen
Affiliation:
AgroParisTech UMR518, Paris 5e, France. alice.cleynen@agroparistech.fr
Emilie Lebarbier
Affiliation:
INRA UMR518, Paris 5e, France; emilie.lebarbier@agroparistech.fr
Get access

Abstract

We consider the segmentation problem of Poisson and negative binomial (i.e.overdispersed Poisson) rate distributions. In segmentation, an important issueremains the choice of the number of segments. To this end, we propose a penalized-likelihood estimator where the penaltyfunction is constructed in a non-asymptotic context following the works of L. Birgé and P.Massart. The resulting estimator is proved to satisfy an oracle inequality. Theperformances of our criterion is assessed using simulated and real datasets in the RNA-seqdata analysis context.

Type
Research Article
Copyright
© EDP Sciences, SMAI 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

H. Akaike, Information Theory and Extension of the Maximum Likelihood Principle. Second int. Symp. Inf. Theory (1973) 267–281.
Akakpo, N., Estimating a discrete distribution via histogram selection. ESAIM: PS 15 (2011) 129. Google Scholar
Arlot, S. and Massart, P., Data-driven calibration of penalties for least-squares regression. J. Mach. Learn. Res. 10 (2009) 245279. Google Scholar
Baraud, Y. and Birgé, L., Estimating the intensity of a random measure by histogram type estimators. Probab. Theory Relat. Fields (2009) 143 239284. Google Scholar
Barron, A., Birgé, L. and Massart, P., Risk bounds for model selection via penalization. Probab. Theory Relat. Fields 113 (1999) 301413. Google Scholar
Biernacki, C., Celeux, G., Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (2000) 719725. Google Scholar
L. Birgé, Model selection for Poisson processes. In Asymptotics: particles, processes and inverse problems, Vol. 55 of IMS Lect. Notes Monogr. Ser.. Beachwood, OH: Inst. Math. Statist. (2007) 32–64.
L. Birgé and P. Massart, From model selection to adaptive estimation, in Festschrift for Lucien Le Cam. New York, Springer (1997) 55–87.
Birgé, L. and Massart, P., Gaussian model selection. J. Eur. Math. Soc. 3 (2001) 203268. Google Scholar
Birgé, L. and Massart, P., Minimal penalties for Gaussian model selection. Probab. Theory Relat. Fields (2007) 138 3373. Google Scholar
Braun, J.V., Braun, R. and Müller, H.G., Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation. Biometrika 87 (2000) 301314. Google Scholar
J.V. Braun, H.G. Muller, Statistical methods for DNA sequence segmentation. Stat. Sci. (1998) 142–162.
Breiman, Friedman, Olshen, Stone: Classification and Regression Trees. Wadsworth and Brooks (1984).
G. Castellan, Modified Akaikes criterion for histogram density estimation. Technical Report#9961 (1999).
A. Cleynen, M. Koskas, E. Lebarbier, G. Rigaill and S. Robin, Segmentor3IsBack, an R package for the fast and exact segmentation of Seq-data. Algorithms for Molecular Biology (2014)
N. Johnson, A. Kemp and S. Kotz, Univariate Discrete Distributions. John Wiley & Sons, Inc. (2005).
R. Killick and I.A. Eckley, Changepoint: an R package for changepoint analysis. Lancaster University (2011).
Lebarbier, E., Detecting multiple change-points in the mean of Gaussian process by model selection. Signal Process. 85 (2005) 717736. Google Scholar
T.M. Luong, Y. Rozenholc and G. Nuel, Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model. Comput. Stat. Data Anal. (2013).
P. Massart, Concentration inequalities and model selection. In Lect. Notes Math. Springer Berlin/Heidelberg (2007).
Reynaud-Bouret, P., Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probab. Theory Relat. Fields 126 (2003) 103153. Google Scholar
G. Rigaill, Pruned dynamic programming for optimal multiple change-point detection. ArXiv:1004.0887 2010, [http://arxiv.org/abs/1004.0887].
Rigaill, G., Lebarbier, E. and Robin, S., Exact posterior distributions and model selection criteria for multiple change-point detection problems. Stat. Comput. 22 (2012) 917929. Google Scholar
Risso, D., Schwartz, K., Sherlock, G. and Dudoit, S., GC-Content Normalization for RNA-Seq Data. BMC Bioinform. 12 (2011) 480. Google ScholarPubMed
Yao, Y.C., Estimating the number of change-points via Schwarz’ criterion. Stat. Probab. Lett. 6 (1988) 181189. Google Scholar
Zhang, N.R. and Siegmund, D.O., A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics 63 (2007) 2232. Google Scholar