Hostname: page-component-78c5997874-dh8gc Total loading time: 0 Render date: 2024-11-10T17:28:07.086Z Has data issue: false hasContentIssue false

Density estimation with quadratic loss: a confidence intervals method

Published online by Cambridge University Press:  25 July 2008

Pierre Alquier*
Affiliation:
Laboratoire de Probabilités et Modèles Aléatoires, Université Paris 6, France; alquier@ensae.fr Laboratoire de Statistique, CREST 3, avenue Pierre Larousse, 92240 Malakoff, France.
Get access

Abstract

We propose a feature selection method for density estimation withquadratic loss. This method relies on the study of unidimensionalapproximation models and on the definition of confidence regions forthe density thanks to these models. It is quite general and includescases of interest like detection of relevant wavelets coefficientsor selection of support vectors in SVM. In the general case, weprove that every selected feature actually improves the performanceof the estimator. In the case where features are defined bywavelets, we prove that this method is adaptative near minimax (upto a log term) in some Besov spaces. We end the paper bysimulations indicating that it must be possible to extend theadaptation result to other features.


Type
Research Article
Copyright
© EDP Sciences, SMAI, 2008

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Akaike, H., A new look at the statistical model identification. IEEE Trans. Autom. Control 19 (1974) 716723. CrossRef
Alquier, P., Iterative Feature Selection In Least Square Regression Estimation. Ann. Inst. H. Poincaré B: Probab. Statist. 44 (2008) 4788. CrossRef
A. Barron, A. Cohen, W. Dahmen and R. DeVore, Adaptative Approximation and Learning by Greedy Algorithms, preprint (2006).
G. Blanchard, P. Massart, R. Vert and L. Zwald, Kernel Projection Machine: A New Tool for Pattern Recognition. Proceedings of NIPS (2004).
B.E. Boser, I.M. Guyon and V.N. Vapnik, A training algorithm for optimal margin classifiers, in Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, D. Haussler (ed.), ACM Press (1992) 144–152.
Cai, T.T. and Brown, L.D., Wavelet Estimation for Samples with Random Uniform Design. Stat. Probab. Lett. 42 (1999) 313321. CrossRef
O. Catoni, Statistical learning theory and stochastic optimization, Lecture Notes, Saint-Flour Summer School on Probability Theory (2001), Springer.
O. Catoni, PAC-Bayesian Inductive and Transductive Learning, manuscript (2006).
O. Catoni, A PAC-Bayesian approach to adaptative classification, preprint Laboratoire de Probabilités et Modèles Aléatoires (2003).
A. Cohen, Wavelet methods in numerical analysis, in Handbook of numerical analysis, Vol. VII, North-Holland, Amsterdam (2000) 417–711.
I. Daubechies, Ten Lectures on Wavelets. SIAM, Philadelphia (1992).
Donoho, D.L. and Johnstone, I.M., Ideal Spatial Adaptation by Wavelets. Biometrika 81 (1994) 425455. CrossRef
Donoho, D.L., Johnstone, I.M., Kerkyacharian, G. and Picard, D., Density Estimation by Wavelet Thresholding. Ann. Statist. 24 (1996) 508539.
Good, I.J. and Gaskins, R.A., Nonparametric roughness penalties for probability densities. Biometrika 58 (1971) 255277. CrossRef
W. Härdle, G. Kerkyacharian, D. Picard and A.B. Tsybakov, Wavelets, Approximations and Statistical Applications. Lecture Notes in Statistics, Springer (1998).
Marron, J.S. and Wand, S.P., Exact Mean Integrated Square Error. Ann. Statist. 20 (1992) 712736. CrossRef
Panchenko, D., Symmetrization Approach to Concentration Inequalities for Empirical Processes. Ann. Probab. 31 (2003) 20682081. CrossRef
R Development Core Team, R: A Language And Environment For Statistical Computing, R Foundation For Statistical Computing, Vienna, Austria, 2004. URL http://www.R-project.org.
Ratsch, G., Schafer, C., Scholkopf, B. and Sonnenburg, S., Large Scale Multiple Kernel Learning. J. Machine Learning Research 7 (2006) 15311565.
Rissanen, J., Modeling by shortest data description. Automatica 14 (1978) 465471. CrossRef
Seeger, M., PAC-Bayesian Generalization Error Bounds for Gaussian Process Classification. J. Machine Learning Res. 3 (2002) 233269. CrossRef
M. Tipping, The Relevance Vector Machine, in Advances in Neural Information Processing Systems, San Mateo, CA (2000). Morgan Kaufmann.
A.B. Tsybakov, Introduction à l'estimation non-paramétrique. Mathématiques et Applications, Springer (2004).
V.N. Vapnik, The nature of statistical learning theory. Springer Verlag (1998).
Zhao Zhang, Su Zhang, Chen-xi Zhang and Ya-zhu Chen, SVM for density estimation and application to medical image segmentation. J. Zhejiang Univ. Sci. B 7 (2006).