Skip to main content Accessibility help
×
Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-01-28T22:08:51.795Z Has data issue: false hasContentIssue false

References

Published online by Cambridge University Press:  12 February 2019

Martin J. Wainwright
Affiliation:
University of California, Berkeley
Get access
Type
Chapter
Information
High-Dimensional Statistics
A Non-Asymptotic Viewpoint
, pp. 524 - 539
Publisher: Cambridge University Press
Print publication year: 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Adamczak, R. 2008. A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electronic Journal of Probability, 34, 10001034.Google Scholar
Adamczak, R., Litvak, A. E., Pajor, A., and Tomczak-Jaegermann, N. 2010. Quantitative estimations of the convergence of the empirical covariance matrix in log-concave ensembles. Journal of the American Mathematical Society, 23, 535561.CrossRefGoogle Scholar
Agarwal, A., Negahban, S., and Wainwright, M. J. 2012. Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. Annals of Statistics, 40(2), 11711197.CrossRefGoogle Scholar
Ahlswede, R., and Winter, A. 2002. Strong converse for identification via quantum channels. IEEE Transactions on Information Theory, 48(3), 569579.CrossRefGoogle Scholar
Aizerman, M. A., Braverman, E. M., and Rozonoer, L. I. 1964. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25, 821837.Google Scholar
Akcakaya, M., and Tarokh, V. 2010. Shannon theoretic limits on noisy compressive sampling. IEEE Transactions on Information Theory, 56(1), 492504.CrossRefGoogle Scholar
Alexander, K. S. 1987. Rates of growth and sample moduli for weighted empirical processes indexed by sets. Probability Theory and Related Fields, 75, 379423.CrossRefGoogle Scholar
Alliney, S., and Ruzinsky, S. A. 1994. An algorithm for the minimization of mixed 1 and 2 norms with application to Bayesian estimation. IEEE Transactions on Signal Processing, 42(3), 618627.CrossRefGoogle Scholar
Amini, A. A., and Wainwright, M. J. 2009. High-dimensional analysis of semdefinite relaxations for sparse principal component analysis. Annals of Statistics, 5B, 28772921.Google Scholar
Anandkumar, A., Tan, V. Y. F., Huang, F., and Willsky, A. S. 2012. High-dimensional structure learning of Ising models: Local separation criterion. Annals of Statistics, 40(3), 13461375.CrossRefGoogle Scholar
Anderson, T. W. 1984. An Introduction to Multivariate Statistical Analysis. Wiley Series in Probability and Mathematical Statistics. New York, NY: Wiley.Google Scholar
Ando, R. K., and Zhang, T. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6(December), 18171853.Google Scholar
Aronszajn, N. 1950. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68, 337404.CrossRefGoogle Scholar
Assouad, P. 1983. Deux remarques sur l’estimation. Comptes Rendus de l’Académie des Sciences, Paris, 296, 10211024.Google Scholar
Azuma, K. 1967. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, 19, 357367.CrossRefGoogle Scholar
Bach, F., Jenatton, R., Mairal, J., and Obozinski, G. 2012. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1), 1106.CrossRefGoogle Scholar
Bahadur, R. R., and Rao, R. R. 1960. On deviations of the sample mean. Annals of Mathematical Statistics, 31, 10151027.CrossRefGoogle Scholar
Bai, Z., and Silverstein, J. W. 2010. Spectral Analysis of Large Dimensional Random Matrices. New York, NY: Springer. Second edition.CrossRefGoogle Scholar
Baik, J., and Silverstein, J. W. 2006. Eigenvalues of large sample covariance matrices of spiked populations models. Journal of Multivariate Analysis, 97(6), 13821408.CrossRefGoogle Scholar
Balabdaoui, F., Rufibach, K., and Wellner, J. A. 2009. Limit distribution theory for maximum likelihood estimation of a log-concave density. Annals of Statistics, 62(3), 12991331.Google Scholar
Ball, K. 1997. An elementary introduction to modern convex geometry. Pages 1–55 of: Flavors of Geometry. MSRI Publications, vol. 31. Cambridge, UK: Cambridge University Press.Google Scholar
Banerjee, O., El Ghaoui, L., and d’Aspremont, A. 2008. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. Journal of Machine Learning Research, 9(March), 485516.Google Scholar
Baraniuk, R. G., Cevher, V., Duarte, M. F., and Hegde, C. 2010. Model-based compressive sensing. IEEE Transactions on Information Theory, 56(4), 19822001.CrossRefGoogle Scholar
Barndorff-Nielson, O. E. 1978. Information and Exponential Families. Chichester, UK: Wiley.Google Scholar
Bartlett, P. L., and Mendelson, S. 2002. Gaussian and Rademacher complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, 463482.Google Scholar
Bartlett, P. L., Bousquet, O., and Mendelson, S. 2005. Local Rademacher complexities. Annals of Statistics, 33(4), 14971537.CrossRefGoogle Scholar
Baxter, R. J. 1982. Exactly Solved Models in Statistical Mechanics. New York, NY: Academic Press.Google Scholar
Bean, D., Bickel, P. J., El Karoui, N., and Yu, B. 2013. Optimal M-estimation in high-dimensional regression. Proceedings of the National Academy of Sciences of the USA, 110(36), 1456314568.CrossRefGoogle ScholarPubMed
Belloni, A., Chernozhukov, V., and Wang, L. 2011. Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4), 791806.CrossRefGoogle Scholar
Bennett, G. 1962. Probability inequalities for the sum of independent random variables. Journal of the American Statistical Association, 57(297), 3345.CrossRefGoogle Scholar
Bento, J., and Montanari, A. 2009 (December). Which graphical models are difficult to learn? In: Proceedings of the NIPS Conference.Google Scholar
Berlinet, A., and Thomas-Agnan, C. 2004. Reproducing Kernel Hilbert Spaces in Probability and Statistics. Norwell, MA: Kluwer Academic.CrossRefGoogle Scholar
Bernstein, S. N. 1937. On certain modifications of Chebyshev’s inequality. Doklady Akademii Nauk SSSR, 16(6), 275277.Google Scholar
Berthet, Q., and Rigollet, P. 2013 (June). Computational lower bounds for sparse PCA. In: Conference on Computational Learning Theory.Google Scholar
Bertsekas, D. P. 2003. Convex Analysis and Optimization. Boston, MA: Athena Scientific.Google Scholar
Besag, J. 1974. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B, 36, 192236.Google Scholar
Besag, J. 1975. Statistical analysis of non-lattice data. The Statistician, 24(3), 179195.CrossRefGoogle Scholar
Besag, J. 1977. Efficiency of pseudolikelihood estimation for simple Gaussian fields. Biometrika, 64(3), 616618.CrossRefGoogle Scholar
Bethe, H. A. 1935. Statistics theory of superlattices. Proceedings of the Royal Society of London, Series A, 150(871), 552575.Google Scholar
Bhatia, R. 1997. Matrix Analysis. Graduate Texts in Mathematics. New York, NY: Springer.Google Scholar
Bickel, P. J., and Doksum, K. A. 2015. Mathematical Statistics: Basic Ideas and Selected Topics. Boca Raton, FL: CRC Press.Google Scholar
Bickel, P. J., and Levina, E. 2008a. Covariance regularization by thresholding. Annals of Statistics, 36(6), 25772604.CrossRefGoogle Scholar
Bickel, P. J., and Levina, E. 2008b. Regularized estimation of large covariance matrices. Annals of Statistics, 36(1), 199227.CrossRefGoogle Scholar
Bickel, P. J., Ritov, Y., and Tsybakov, A. B. 2009. Simultaneous analysis of lasso and Dantzig selector. Annals of Statistics, 37(4), 17051732.CrossRefGoogle Scholar
Birgé, L. 1983. Approximation dans les espaces metriques et theorie de l’estimation. Z. Wahrsch. verw. Gebiete, 65, 181327.CrossRefGoogle Scholar
Birgé, L. 1987. Estimating a density under order restrictions: Non-asymptotic minimax risk. Annals of Statistics, 15(3), 9951012.CrossRefGoogle Scholar
Birgé, L. 2005. A new lower bound for multiple hypothesis testing. IEEE Transactions on Information Theory, 51(4), 16111614.CrossRefGoogle Scholar
Birgé, L., and Massart, P. 1995. Estimation of integral functionals of a density. Annals of Statistics, 23(1), 1129.CrossRefGoogle Scholar
Birnbaum, A., Johnstone, I. M., Nadler, B., and Paul, D. 2012. Minimax bounds for sparse PCA with noisy high-dimensional data. Annals of Statistics, 41(3), 10551084.Google Scholar
Bobkov, S. G. 1999. Isoperimetric and analytic inequalities for log-concave probability measures. Annals of Probability, 27(4), 19031921.CrossRefGoogle Scholar
Bobkov, S. G., and Götze, F. 1999. Exponential integrability and transportation cost related to logarithmic Sobolev inequalities. Journal of Functional Analysis, 163, 128.CrossRefGoogle Scholar
Bobkov, S. G., and Ledoux, M. 2000. From Brunn-Minkowski to Brascamp-Lieb and to logarithmic Sobolev inequalities. Geometric and Functional Analysis, 10, 10281052.CrossRefGoogle Scholar
Borgwardt, K., Gretton, A., Rasch, M., Kriegel, H. P., Schölkopf, B., and Smola, A. J. 2006. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics, 22(14), 4957.CrossRefGoogle ScholarPubMed
Borwein, J., and Lewis, A. 1999. Convex Analysis. New York, NY: Springer.Google Scholar
Boser, B. E., Guyon, I. M., and Vapnik, V. N. 1992. A training algorithm for optimal margin classifiers. Pages 144–152 of: Proceedings of the Conference on Learning Theory (COLT). New York, NY: ACM.Google Scholar
Boucheron, S., Lugosi, G., and Massart, P. 2003. Concentration inequalities using the entropy method. Annals of Probability, 31(3), 15831614.CrossRefGoogle Scholar
Boucheron, S., Lugosi, G., and Massart, P. 2013. Concentration inequalities: A nonasymptotic theory of independence. Oxford, UK: Oxford University Press.CrossRefGoogle Scholar
Bourgain, J., Dirksen, S., and Nelson, J. 2015. Toward a unified theory of sparse dimensionality reduction in Euclidean space. Geometric and Functional Analysis, 25(4).CrossRefGoogle Scholar
Bousquet, O. 2002. A Bennett concentration inequality and its application to suprema of empirical processes. Comptes Rendus de l’Académie des Sciences, Paris, Série I, 334, 495500.Google Scholar
Bousquet, O. 2003. Concentration inequalities for sub-additive functions using the entropy method. Stochastic Inequalities and Applications, 56, 213247.CrossRefGoogle Scholar
Boyd, S., and Vandenberghe, L. 2004. Convex optimization. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Brascamp, H. J., and Lieb, E. H. 1976. On extensions of the Brunn–Minkowski and Prékopa–Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. Journal of Functional Analysis, 22, 366389.CrossRefGoogle Scholar
Breiman, L. 1992. Probability. Classics in Applied Mathematics. Philadelphia, PA: S IAM.Google Scholar
Bresler, G. 2014. Efficiently learning Ising models on arbitrary graphs. Tech. rept. MIT.CrossRefGoogle Scholar
Bresler, G., Mossel, E., and Sly, A. 2013. Reconstruction of Markov Random Fields from samples: Some observations and algorithms. SIAM Journal on Computing, 42(2), 563578.CrossRefGoogle Scholar
Bronshtein, E. M. 1976. ϵ-entropy of convex sets and functions. Siberian Mathematical Journal, 17, 393398.CrossRefGoogle Scholar
Brown, L. D. 1986. Fundamentals of statistical exponential families. Hayward, CA: Institute of Mathematical Statistics.Google Scholar
Brunk, H. D. 1955. Maximum likelihood estimates of monotone parameters. Annals of Math. Statistics, 26, 607616.CrossRefGoogle Scholar
Brunk, H. D. 1970. Estimation of isotonic regression. Pages 177–197 of: Nonparametric techniques in statistical inference. New York, NY: Cambridge University Press.Google Scholar
Bühlmann, P., and van de Geer, S. 2011. Statistics for high-dimensional data. Springer Series in Statistics. Springer.CrossRefGoogle Scholar
Buja, A., Hastie, T. J., and Tibshirani, R. 1989. Linear smoothers and additive models. Annals of Statistics, 17(2), 453510.Google Scholar
Buldygin, V. V., and Kozachenko, Y. V. 2000. Metric characterization of random variables and random processes. Providence, RI: American Mathematical Society.CrossRefGoogle Scholar
Bunea, F., Tsybakov, A. B., and Wegkamp, M. 2007. Sparsity oracle inequalities for the Lasso. Electronic Journal of Statistics, 169194.Google Scholar
Bunea, F., She, Y., and Wegkamp, M. 2011. Optimal selection of reduced rank estimators of high-dimensional matrices. Annals of Statistics, 39(2), 12821309.CrossRefGoogle Scholar
Cai, T. T., Zhang, C. H., and Zhou, H. H. 2010. Optimal rates of convergence for covariance matrix estimation. Annals of Statistics, 38(4), 21182144.CrossRefGoogle Scholar
Cai, T. T., Liu, W., and Luo, X. 2011. A constrained 1-minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106, 594607.CrossRefGoogle Scholar
Cai, T. T., Liang, T., and Rakhlin, A. 2015. Computational and statistical boundaries for submatrix localization in a large noisy matrix. Tech. rept. Univ. Penn.Google Scholar
Candès, E. J., and Plan, Y. 2010. Matrix completion with noise. Proceedings of the IEEE, 98(6), 925936.CrossRefGoogle Scholar
Candès, E. J., and Recht, B. 2009. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6), 717772.CrossRefGoogle Scholar
Candès, E. J., and Tao, T. 2005. Decoding by linear programming. IEEE Transactions on Information Theory, 51(12), 42034215.CrossRefGoogle Scholar
Candès, E. J., and Tao, T. 2007. The Dantzig selector: statistical estimation when p is much larger than n. Annals of Statistics, 35(6), 23132351.Google Scholar
Candès, E. J., Li, X., Ma, Y., and Wright, J. 2011. Robust principal component analysis? Journal of the ACM, 58(3), 11 (37pp).CrossRefGoogle Scholar
Candès, E. J., Strohmer, T., and Voroninski, V. 2013. PhaseLift: exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics, 66(8), 12411274.CrossRefGoogle Scholar
Cantelli, F. P. 1933. Sulla determinazione empirica della legge di probabilita. Giornale dell’Istituto Italiano degli Attuari, 4, 421424.Google Scholar
Carl, B., and Pajor, A. 1988. Gelfand numbers of operators with values in a Hilbert space. Inventiones Mathematicae, 94, 479504.CrossRefGoogle Scholar
Carl, B., and Stephani, I. 1990. Entropy, Compactness and the Approximation of Operators. Cambridge Tracts in Mathematics. Cambridge, UK: Cambridge University Press.Google Scholar
Carlen, E. 2009. Trace inequalities and quantum entropy: an introductory course. In: Entropy and the Quantum. Providence, RI: American Mathematical Society.Google Scholar
Carroll, R. J., Ruppert, D., and Stefanski, L. A. 1995. Measurement Error in Nonlinear Models. Boca Raton, FL: Chapman & Hall/CRC.CrossRefGoogle Scholar
Chai, A., Moscoso, M., and Papanicolaou, G. 2011. Array imaging using intensity-only measurements. Inverse Problems, 27(1), 1—15.CrossRefGoogle Scholar
Chandrasekaran, V., Sanghavi, S., Parrilo, P. A., and Willsky, A. S. 2011. Rank-Sparsity Incoherence for Matrix Decomposition. SIAM Journal on Optimization, 21, 572596.CrossRefGoogle Scholar
Chandrasekaran, V., Recht, B., Parrilo, P. A., and Willsky, A. S. 2012a. The convex geometry of linear inverse problems. Foundations of Computational Mathematics, 12(6), 805849.CrossRefGoogle Scholar
Chandrasekaran, V., Parrilo, P. A., and Willsky, A. S. 2012b. Latent variable graphical model selection via convex optimization. Annals of Statistics, 40(4), 19351967.Google Scholar
Chatterjee, S. 2005 (October). An error bound in the Sudakov-Fernique inequality. Tech. rept. UC Berkeley. arXiv:math.PR/0510424.Google Scholar
Chatterjee, S. 2007. Stein’s method for concentration inequalities. Probability Theory and Related Fields, 138(1–2), 305321.CrossRefGoogle Scholar
Chatterjee, S., Guntuboyina, A., and Sen, B. 2015. On risk bounds in isotonic and other shape restricted regression problems. Annals of Statistics, 43(4), 17741800.CrossRefGoogle Scholar
Chen, S., Donoho, D. L., and Saunders, M. A. 1998. Atomic decomposition by basis pursuit. SIAM J. Sci. Computing, 20(1), 3361.CrossRefGoogle Scholar
Chernoff, H. 1952. A measure of asymptotic efficiency for tests of a hypothesis based on a sum of observations. Annals of Mathematical Statistics, 23, 493507.CrossRefGoogle Scholar
Chernozhukov, V., Chetverikov, D., and Kato, K. 2013. Comparison and anti-concentration bounds for maxima of Gaussian random vectors. Tech. rept. MIT.CrossRefGoogle Scholar
Chung, F.R.K. 1991. Spectral Graph Theory. Providence, RI: American Mathematical Society.Google Scholar
Clifford, P. 1990. Markov random fields in statistics. In: Grimmett, G.R., and Welsh, D. J. A. (eds), Disorder in physical systems. Oxford Science Publications.Google Scholar
Cohen, A., Dahmen, W., and DeVore, R. A. 2008. Compressed sensing and best k-term approximation. J. of. American Mathematical Society, 22(1), 211231.CrossRefGoogle Scholar
Cormode, G. 2012. Synopses for massive data: Samples, histograms, wavelets and sketches. Foundations and Trends in Databases, 4(2), 1294.CrossRefGoogle Scholar
Cover, T.M., and Thomas, J.A. 1991. Elements of Information Theory. New York, NY: Wiley.Google Scholar
Cule, M., Samworth, R. J., and Stewart, M. 2010. Maximum likelihood estimation of a multi-dimensional log-concave density. J. R. Stat. Soc. B, 62, 545607.CrossRefGoogle Scholar
Dalalyan, A. S., Hebiri, M., and Lederer, J. 2014. On the prediction performance of the Lasso. Tech. rept. ENSAE. arxiv:1402,1700, to appear in Bernoulli.Google Scholar
d’Aspremont, A., El Ghaoui, L., Jordan, M. I., and Lanckriet, G. R. 2007. A direct formulation for sparse PCA using semidefinite programming. SIAM Review, 49(3), 434448.CrossRefGoogle Scholar
d’Aspremont, A., Banerjee, O., and El Ghaoui, L. 2008. First order methods for sparse covariance selection. SIAM Journal on Matrix Analysis and Its Applications, 30(1), 5566.Google Scholar
Davidson, K. R., and Szarek, S. J. 2001. Local operator theory, random matrices, and Banach spaces. Pages 317–336 of: Handbook of Banach Spaces, vol. 1. Amsterdam, NL: Elsevier.Google Scholar
Dawid, A. P. 2007. The geometry of proper scoring rules. Annals of the Institute of Statistical Mathematics, 59, 7793.CrossRefGoogle Scholar
de La Pena, V., and Giné, E. 1999. Decoupling: From dependence to independence. New York, NY: Springer.CrossRefGoogle Scholar
Dembo, A. 1997. Information inequalities and concentration of measure. Annals of Probability, 25(2), 927939.CrossRefGoogle Scholar
Dembo, A., and Zeitouni, O. 1996. Transportation approach to some concentration inequalities in product spaces. Electronic Communications in Probability, 1, 8390.CrossRefGoogle Scholar
DeVore, R. A., and Lorentz, G. G. 1993. Constructive Approximation. New York, NY: Springer.CrossRefGoogle Scholar
Devroye, L., and Györfi, L. 1986. Nonparametric density estimation: the L1 view. New York, NY: Wiley.Google Scholar
Donoho, D. L. 2006a. For most large underdetermined systems of linear equations, the minimal 1-norm near-solution approximates the sparsest near-solution. Communications on Pure and Applied Mathematics, 59(7), 907934.CrossRefGoogle Scholar
Donoho, D. L. 2006b. For most large underdetermined systems of linear equations, the minimal 1-norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics, 59(6), 797829.CrossRefGoogle Scholar
Donoho, D. L., and Huo, X. 2001. Uncertainty principles and ideal atomic decomposition. IEEE Transactions on Information Theory, 47(7), 28452862.CrossRefGoogle Scholar
Donoho, D. L., and Johnstone, I. M. 1994. Minimax risk over p-balls for q-error. Probability Theory and Related Fields, 99, 277303.CrossRefGoogle Scholar
Donoho, D. L., and Montanari, A. 2013. High dimensional robust M-estimation: asymptotic variance via approximate message passing. Tech. rept. Stanford University. Posted as arxiv:1310.7320.Google Scholar
Donoho, D. L., and Stark, P. B. 1989. Uncertainty principles and signal recovery. SIAM Journal of Applied Mathematics, 49, 906931.CrossRefGoogle Scholar
Donoho, D. L., and Tanner, J. M. 2008. Counting faces of randomly-projected polytopes when the projection radically lowers dimension. Journal of the American Mathematical Society, July.CrossRefGoogle Scholar
Duchi, J. C., Wainwright, M. J., and Jordan, M. I. 2013. Local privacy and minimax bounds: Sharp rates for probability estimation. Tech. rept. UC Berkeley.Google Scholar
Duchi, J. C., Wainwright, M. J., and Jordan, M. I. 2014. Privacy-aware learning. Journal of the ACM, 61(6), Article 37.CrossRefGoogle Scholar
Dudley, R. M. 1967. The sizes of compact subsets of Hilbert spaces and continuity of Gaussian processes. Journal of Functional Analysis, 1, 290330.CrossRefGoogle Scholar
Dudley, R. M. 1978. Central limit theorems for empirical measures. Annals of Probability, 6, 899929.CrossRefGoogle Scholar
Dudley, R. M. 1999. Uniform central limit theorems. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Dümbgen, L., Samworth, R. J., and Schuhmacher, D. 2011. Approximation by log-concave distributions with applications to regression. Annals of Statistics, 39(2), 702730.CrossRefGoogle Scholar
Durrett, R. 2010. Probability: Theory and examples. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Dvoretsky, A., Kiefer, J., and Wolfowitz, J. 1956. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Annals of Mathematical Statistics, 27, 642669.CrossRefGoogle Scholar
Eggermont, P. P. B., and LaRiccia, V. N. 2001. Maximum penalized likelihood estimation: V. I Density estimation. Springer Series in Statistics, vol. 1. New York, NY: Springer.CrossRefGoogle Scholar
Eggermont, P. P. B., and LaRiccia, V. N. 2007. Maximum penalized likelihood estimation: V. II Regression. Springer Series in Statistics, vol. 2. New York, NY: Springer.Google Scholar
El Karoui, N. 2008. Operator norm consistent estimation of large-dimensional sparse covariance matrices. Annals of Statistics, 36(6), 27172756.Google Scholar
El Karoui, N. 2013. Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators : rigorous results. Tech. rept. UC Berkeley. Posted as arxiv:1311.2445.Google Scholar
El Karoui, N., Bean, D., Bickel, P. J., and Yu, B. 2013. On robust regression with high-dimensional predictors. Proceedings of the National Academy of Sciences of the USA, 110(36), 1455714562.CrossRefGoogle ScholarPubMed
Elad, M., and Bruckstein, A. M. 2002. A generalized uncertainty principle and sparse representation in pairs of bases. IEEE Transactions on Information Theory, 48(9), 25582567.CrossRefGoogle Scholar
Fan, J., and Li, R. 2001. Variable selection via non-concave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 13481360.CrossRefGoogle Scholar
Fan, J., and Lv, J. 2011. Nonconcave penalized likelihood with NP-dimensionality. IEEE Transactions on Information Theory, 57(8), 54675484.CrossRefGoogle ScholarPubMed
Fan, J., Liao, Y., and Mincheva, M. 2013. Large covariance estimation by thresholding principal orthogonal components. Journal of the Royal Statistical Society B, 75, 603680.CrossRefGoogle Scholar
Fan, J., Xue, L., and Zou, H. 2014. Strong oracle optimality of folded concave penalized estimation. Annals of Statistics, 42(3), 819849.CrossRefGoogle ScholarPubMed
Fazel, M. 2002. Matrix Rank Minimization with Applications. Ph.D. thesis, Stanford. Available online: http://faculty.washington.edu/mfazel/thesis-final.pdf.Google Scholar
Fernique, X. M. 1974. Des resultats nouveaux sur les processus Gaussiens. Comptes Rendus de l’Académie des Sciences, Paris, 278, A363–A365.Google Scholar
Feuer, A., and Nemirovski, A. 2003. On sparse representation in pairs of bases. IEEE Transactions on Information Theory, 49(6), 15791581.CrossRefGoogle Scholar
Fienup, J. R. 1982. Phase retrieval algorithms: a comparison. Applied Optics, 21(15), 27582769.CrossRefGoogle ScholarPubMed
Fienup, J. R., and Wackerman, C. C. 1986. Phase-retrieval stagnation problems and solutions. Journal of the Optical Society of America A, 3, 18971907.CrossRefGoogle Scholar
Fletcher, A. K., Rangan, S., and Goyal, V. K. 2009. Necessary and Sufficient Conditions for Sparsity Pattern Recovery. IEEE Transactions on Information Theory, 55(12), 57585772.CrossRefGoogle Scholar
Foygel, R., and Srebro, N. 2011. Fast rate and optimistic rate for ℓ1-regularized regression. Tech. rept. Toyoto Technological Institute. arXiv:1108.037v1.Google Scholar
Friedman, J. H., and Stuetzle, W. 1981. Projection pursuit regression. Journal of the American Statistical Association, 76(376), 817823.CrossRefGoogle Scholar
Friedman, J. H., and Tukey, J. W. 1994. A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computers, C-23, 881889.Google Scholar
Friedman, J. H., Hastie, T. J., and Tibshirani, R. 2007. Sparse inverse covariance estimation with the graphical Lasso. Biostatistics.CrossRefGoogle Scholar
Fuchs, J. J. 2004. Recovery of exact sparse representations in the presence of noise. Pages 533–536 of: ICASSP, vol. 2.Google Scholar
Gallager, R. G. 1968. Information theory and reliable communication. New York, NY: Wiley.Google Scholar
Gao, C., Ma, Z., and Zhou, H. H. 2015. Sparse CCA: Adaptive estimation and computational barriers. Tech. rept. Yale University.Google Scholar
Gardner, R. J. 2002. The Brunn-Minkowski inequality. Bulletin of the American Mathematical Society, 39, 355405.CrossRefGoogle Scholar
Geman, S. 1980. A limit theorem for the norm of random matrices. Annals of Probability, 8(2), 252261.CrossRefGoogle Scholar
Geman, S., and Geman, D. 1984. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721741.CrossRefGoogle ScholarPubMed
Geman, S., and Hwang, C. R. 1982. Nonparametric maximum likelihood estimation by the method of sieves. Annals of Statistics, 10(2), 401414.CrossRefGoogle Scholar
Glivenko, V. 1933. Sulla determinazione empirica della legge di probabilita. Giornale dell’Istituto Italiano degli Attuari, 4, 9299.Google Scholar
Gneiting, T., and Raftery, A. E. 2007. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359378.CrossRefGoogle Scholar
Goldberg, K., Roeder, T., Gupta, D., and Perkins, C. 2001. Eigentaste: A constant time collaborative filtering algorithm. Information Retrieval, 4(2), 133151.CrossRefGoogle Scholar
Good, I. J., and Gaskins, R. A. 1971. Nonparametric roughness penalties for probability densities. Biometrika, 58, 255277.CrossRefGoogle Scholar
Gordon, Y. 1985. Some inequalities for Gaussian processes and applications. Israel Journal of Mathematics, 50, 265289.CrossRefGoogle Scholar
Gordon, Y. 1986. On Milman’s inequality and random subspaces which escape through a mesh in Rn. Pages 84–106 of: Geometric aspects of functional analysis. Lecture Notes in Mathematics, vol. 1317. Springer-Verlag.Google Scholar
Gordon, Y. 1987. Elliptically contoured distributions. Probability Theory and Related Fields, 76, 429438.CrossRefGoogle Scholar
Götze, F., and Tikhomirov, A. 2004. Rate of convergence in probability to the Marčenko-Pastur law. Bernoulli, 10(3), 503548.CrossRefGoogle Scholar
Grechberg, R. W., and Saxton, W. O. 1972. A practical algorithm for the determination of phase from image and diffraction plane intensities. Optik, 35, 237246.Google Scholar
Greenshtein, E., and Ritov, Y. 2004. Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization. Bernoulli, 10, 971988.CrossRefGoogle Scholar
Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., and Smola, A. 2012. A kernel two-sample test. Journal of Machine Learning Research, 13, 723773.Google Scholar
Griffin, D., and Lim, J. 1984. Signal estimation from modified short-time Fourier transforms. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2), 236243.CrossRefGoogle Scholar
Grimmett, G. R. 1973. A theorem about random fields. Bulletin of the London Mathematical Society, 5, 8184.CrossRefGoogle Scholar
Gross, D. 2011. Recovering low-rank matrices from few coefficients in any basis. IEEE Transactions on Information Theory, 57(3), 15481566.CrossRefGoogle Scholar
Gross, L. 1975. Logarithmic Sobolev inequalities. American Journal Math., 97, 10611083.CrossRefGoogle Scholar
Gu, C. 2002. Smoothing spline ANOVA models. Springer Series in Statistics. New York, NY: Springer.Google Scholar
Guédon, O., and Litvak, A. E. 2000. Euclidean projections of a p-convex body. Pages 95–108 of: Geometric aspects of functional analysis. Springer.CrossRefGoogle Scholar
Guntuboyina, A. 2011. Lower bounds for the minimax risk using f -divergences and applications. IEEE Transactions on Information Theory, 57(4), 23862399.CrossRefGoogle Scholar
Guntuboyina, A., and Sen, B. 2013. Covering numbers for convex functions. IEEE Transactions on Information Theory, 59, 19571965.CrossRefGoogle Scholar
Gyorfi, L., Kohler, M., Krzyzak, A., and Walk, H. 2002. A Distribution-Free Theory of Nonparametric Regression. Springer Series in Statistics. Springer.CrossRefGoogle Scholar
Hammersley, J. M., and Clifford, P. 1971. Markov fields on finite graphs and lattices. Unpublished.Google Scholar
Hanson, D. L., and Pledger, G. 1976. Consistency in concave regression. Annals of Statistics, 4, 10381050.CrossRefGoogle Scholar
Hanson, D. L., and Wright, F. T. 1971. A bound on tail probabilities for quadratic forms in independent random variables. Annals of Mathematical Statistics, 42(3), 10791083.CrossRefGoogle Scholar
Härdle, W. K., and Stoker, T. M. 1989. Investigating smooth multiple regression by the method of average derivatives. Journal of the American Statistical Association, 84, 986995.Google Scholar
Härdle, W. K., Hall, P., and Ichimura, H. 1993. Optimal smoothing in single-index models. Annals of Statistics, 21, 157178.CrossRefGoogle Scholar
Härdle, W. K., Müller, M., Sperlich, S., and Werwatz, A. 2004. Nonparametric and semiparametric models. Springer Series in Statistics. New York, NY: Springer.CrossRefGoogle Scholar
Harper, L. H. 1966. Optimal numberings and isoperimetric problems on graphs. Journal of Combinatorial Theory, 1, 385393.CrossRefGoogle Scholar
Harrison, R. W. 1993. Phase problem in crystallography. Journal of the Optical Society of America A, 10(5), 10461055.CrossRefGoogle Scholar
Hasminskii, R. Z. 1978. A lower bound on the risks of nonparametric estimates of densities in the uniform metric. Theory of Probability and Its Applications, 23, 794798.CrossRefGoogle Scholar
Hasminskii, R. Z., and Ibragimov, I. 1981. Statistical estimation: Asymptotic theory. New York, NY: Springer.Google Scholar
Hasminskii, R. Z., and Ibragimov, I. 1990. On density estimation in the view of Kolmogorov’s ideas in approximation theory. Annals of Statistics, 18(3), 9991010.CrossRefGoogle Scholar
Hastie, T. J., and Tibshirani, R. 1986. Generalized additive models. Statistical Science, 1(3), 297310.Google Scholar
Hastie, T. J., and Tibshirani, R. 1990. Generalized Additive Models. Boca Raton, FL: Chapman & Hall/CRC.Google Scholar
Hildreth, C. 1954. Point estimates of ordinates of concave functions. Journal of the American Statistical Association, 49, 598619.CrossRefGoogle Scholar
Hiriart-Urruty, J., and Lemaréchal, C. 1993. Convex Analysis and Minimization Algorithms. Vol. 1. New York, NY: Springer.CrossRefGoogle Scholar
Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 1330.CrossRefGoogle Scholar
Hoerl, A. E., and Kennard, R. W. 1970. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12, 5567.CrossRefGoogle Scholar
Hölfing, H., and Tibshirani, R. 2009. Estimation of sparse binary pairwise Markov networks using pseudo-likelihoods. Journal of Machine Learning Research, 19, 883906.Google Scholar
Holley, R., and Stroock, D. 1987. Log Sobolev inequalities and stochastic Ising models. Journal of Statistical Physics, 46(5), 11591194.CrossRefGoogle Scholar
Horn, R. A., and Johnson, C. R. 1985. Matrix Analysis. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Horn, R. A., and Johnson, C. R. 1991. Topics in Matrix Analysis. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Hristache, M., Juditsky, A., and Spokoiny, V. 2001. Direct estimation of the index coefficient in a single index model. Annals of Statistics, 29, 595623.CrossRefGoogle Scholar
Hsu, D., Kakade, S. M., and Zhang, T. 2012a. Tail inequalities for sums of random matrices that depend on the intrinsic dimension. Electronic Communications in Probability, 17(14), 113.CrossRefGoogle Scholar
Hsu, D., Kakade, S. M., and Zhang, T. 2012b. A tail inequality for quadratic forms of sub-Gaussian random vectors. Electronic Journal of Probability, 52, 16.Google Scholar
Huang, J., and Zhang, T. 2010. The benefit of group sparsity. Annals of Statistics, 38(4), 19782004.CrossRefGoogle Scholar
Huang, J., Ma, S., and Zhang, C. H. 2008. Adaptive Lasso for sparse high-dimensional regression models. Statistica Sinica, 18, 16031618.Google Scholar
Huber, P. J. 1973. Robust regression: Asymptotics, conjectures and Monte Carlo. Annals of Statistics, 1(5), 799821.CrossRefGoogle Scholar
Huber, P. J. 1985. Projection pursuit. Annals of Statistics, 13(2), 435475.Google Scholar
Ichimura, H. 1993. Semiparametric least squares (SLS) and weighted (SLS) estimation of single index models. Journal of Econometrics, 58, 71120.CrossRefGoogle Scholar
Ising, E. 1925. Beitrag zur Theorie der Ferromagnetismus. Zeitschrift für Physik, 31(1), 253258.CrossRefGoogle Scholar
Iturria, S. J., Carroll, R. J., and Firth, D. 1999. Polynomial Regression and Estimating Functions in the Presence of Multiplicative Measurement Error. Journal of the Royal Statistical Society B, 61, 547561.CrossRefGoogle Scholar
Izenman, A. J. 1975. Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis, 5, 248264.CrossRefGoogle Scholar
Izenman, A. J. 2008. Modern multivariate statistical techniques: Regression, classification and manifold learning. New York, NY: Springer.CrossRefGoogle Scholar
Jacob, L., Obozinski, G., and Vert, J. P. 2009. Group Lasso with overlap and graph Lasso. Pages 433–440 of: International Conference on Machine Learning (ICML).CrossRefGoogle Scholar
Jalali, A., Ravikumar, P., Sanghavi, S., and Ruan, C. 2010. A Dirty Model for Multi-task Learning. Pages 964–972 of: Advances in Neural Information Processing Systems 23.Google Scholar
Johnson, W. B., and Lindenstrauss, J. 1984. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26, 189206.CrossRefGoogle Scholar
Johnstone, I. M. 2001. On the distribution of the largest eigenvalue in principal components analysis. Annals of Statistics, 29(2), 295327.CrossRefGoogle Scholar
Johnstone, I. M. 2015. Gaussian estimation: Sequence and wavelet models. New York, NY: Springer.Google Scholar
Johnstone, I. M., and Lu, A. Y. 2009. On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104, 682693.CrossRefGoogle ScholarPubMed
Jolliffe, I. T. 2004. Principal Component Analysis. New York, NY: Springer.Google Scholar
Jolliffe, I. T., Trendafilov, N. T., and Uddin, M. 2003. A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics, 12, 531547.CrossRefGoogle Scholar
Juditsky, A., and Nemirovski, A. 2000. Functional aggregation for nonparametric regression. Annals of Statistics, 28, 681712.CrossRefGoogle Scholar
Kahane, J. P. 1986. Une inequalité du type de Slepian et Gordon sur les processus Gaussiens. Israel Journal of Mathematics, 55, 109110.CrossRefGoogle Scholar
Kalisch, M., and Bühlmann, P. 2007. Estimating high-dimensional directed acyclic graphs with the PC algorithm. Journal of Machine Learning Research, 8, 613636.Google Scholar
Kane, D. M., and Nelson, J. 2014. Sparser Johnson-Lindenstrauss transforms. Journal of the ACM, 61(1).CrossRefGoogle Scholar
Kantorovich, L. V., and Rubinstein, G. S. 1958. On the space of completely additive functions. Vestnik Leningrad Univ. Ser. Math. Mekh. i. Astron, 13(7), 5259. In Russian.Google Scholar
Keener, R. W. 2010. Theoretical Statistics: Topics for a Core Class. New York, NY: Springer.CrossRefGoogle Scholar
Keshavan, R. H., Montanari, A., and Oh, S. 2010a. Matrix Completion from Few Entries. IEEE Transactions on Information Theory, 56(6), 29802998.CrossRefGoogle Scholar
Keshavan, R. H., Montanari, A., and Oh, S. 2010b. Matrix Completion from Noisy Entries. Journal of Machine Learning Research, 11(July), 20572078.Google Scholar
Kim, Y., Kim, J., and Kim, Y. 2006. Blockwise sparse regression. Statistica Sinica, 16(2).Google Scholar
Kimeldorf, G., and Wahba, G. 1971. Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications, 33, 8295.CrossRefGoogle Scholar
Klein, T., and Rio, E. 2005. Concentration around the mean for maxima of empirical processes. Annals of Probability, 33(3), 10601077.CrossRefGoogle Scholar
Koller, D., and Friedman, N. 2010. Graphical Models. New York, NY: MIT Press.Google Scholar
Kolmogorov, A. N. 1956. Asymptotic characterization of some completely bounded metric spaces. Doklady Akademii Nauk SSSR, 108, 585589.Google Scholar
Kolmogorov, A. N. 1958. Linear dimension of topological vector spaces. Doklady Akademii Nauk SSSR, 120, 239–241–589.Google Scholar
Kolmogorov, A. N., and Tikhomirov, B. 1959. ϵ-entropy and ϵ-capacity of sets in functional spaces. Uspekhi Mat. Nauk., 86, 386. Appeared in English as 1961. American Mathematical Society Translations, 17, 277–364.Google Scholar
Koltchinskii, V. 2001. Rademacher penalities and structural risk minimization. IEEE Transactions on Information Theory, 47(5), 19021914.CrossRefGoogle Scholar
Koltchinskii, V. 2006. Local Rademacher complexities and oracle inequalities in risk minimization. Annals of Statistics, 34(6), 25932656.Google Scholar
Koltchinskii, V., and Panchenko, D. 2000. Rademacher processes and bounding the risk of function learning. Pages 443–459 of: High-dimensional probability II. Springer.Google Scholar
Koltchinskii, V., and Yuan, M. 2010. Sparsity in multiple kernel learning. Annals of Statistics, 38, 36603695.CrossRefGoogle Scholar
Koltchinskii, V., Lounici, K., and Tsybakov, A. B. 2011. Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Annals of Statistics, 39, 23022329.CrossRefGoogle Scholar
Kontorovich, L. A., and Ramanan, K. 2008. Concentration inequalities for dependent random variables via the martingale method. Annals of Probability, 36(6), 21262158.CrossRefGoogle Scholar
Kruskal, J. B. 1969. Towards a practical method which helps uncover the structure of a set of multivariate observation by finding the linear transformation which optimizes a new ‘index of condensation’. In: Statistical computation. New York, NY: Academic Press.Google Scholar
Kühn, T. 2001. A lower estimate for entropy numbers. Journal of Approximation Theory, 110, 120124.CrossRefGoogle Scholar
Kullback, S., and Leibler, R. A. 1951. On information and sufficiency. Annals of Mathematical Statistics, 22(1), 7986.CrossRefGoogle Scholar
Lam, C., and Fan, J. 2009. Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation. Annals of Statistics, 37, 42544278.CrossRefGoogle ScholarPubMed
Laurent, M. 2001. Matrix Completion Problems. Pages 221—229 of: The Encyclopedia of Optimization. Kluwer Academic.Google Scholar
Laurent, M. 2003. A comparison of the Sherali-Adams, Lovász-Schrijver and Lasserre relaxations for 0-1 programming. Mathematics of Operations Research, 28, 470496.CrossRefGoogle Scholar
Lauritzen, S. L. 1996. Graphical Models. Oxford: Oxford University Press.CrossRefGoogle Scholar
Le Cam, L. 1973. Convergence of estimates under dimensionality restrictions. Annals of Statistics, January.CrossRefGoogle Scholar
Ledoux, M. 1996. On Talagrand’s deviation inequalities for product measures. ESAIM: Probability and Statistics, 1(July), 6387.CrossRefGoogle Scholar
Ledoux, M. 2001. The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs. Providence, RI: American Mathematical Society.Google Scholar
Ledoux, M., and Talagrand, M. 1991. Probability in Banach Spaces: Isoperimetry and Processes. New York, NY: Springer.CrossRefGoogle Scholar
Lee, J. D., Sun, Y., and Taylor, J. 2013. On model selection consistency of M-estimators with geometrically decomposable penalties. Tech. rept. Stanford University. arxiv1305.7477v4.Google Scholar
Leindler, L. 1972. On a certain converse of Hölder’s inequality. Acta Scientiarum Mathematicarum (Szeged), 33, 217223.Google Scholar
Levy, S., and Fullagar, P. K. 1981. Reconstruction of a sparse spike train from a portion of its spectrum and application to high-resolution deconvolution. Geophysics, 46(9), 12351243.CrossRefGoogle Scholar
Lieb, E. H. 1973. Convex trace functions and the Wigner-Yanase-Dyson conjecture. Advances in Mathematics, 11, 267288.CrossRefGoogle Scholar
Lindley, D. V. 1956. On a measure of the information provided by an experiment. Annals of Mathematical Statistics, 27(4), 9861005.CrossRefGoogle Scholar
Liu, H., Lafferty, J. D., and Wasserman, L. A. 2009. The nonparanormal: Semiparametric estimation of high-dimensional undirected graphs. Journal of Machine Learning Research, 10, 137.Google Scholar
Liu, H., Han, F., Yuan, M., Lafferty, J. D., and Wasserman, L. A. 2012. High-dimensional semiparametric Gaussian copula graphical models. Annals of Statistics, 40(4), 22932326.CrossRefGoogle Scholar
Loh, P., and Wainwright, M. J. 2012. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Annals of Statistics, 40(3), 16371664.CrossRefGoogle Scholar
Loh, P., and Wainwright, M. J. 2013. Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses. Annals of Statistics, 41(6), 30223049.CrossRefGoogle Scholar
Loh, P., and Wainwright, M. J. 2015. Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima. Journal of Machine Learning Research, 16(April), 559616.Google Scholar
Loh, P., and Wainwright, M. J. 2017. Support recovery without incoherence: A case for nonconvex regularization. Annals of Statistics, 45(6), 24552482. Appeared as arXiv:1412.5632.CrossRefGoogle Scholar
Lorentz, G. G. 1966. Metric entropy and approximation. Bulletin of the AMS, 72(6), 903937.CrossRefGoogle Scholar
Lounici, K., Pontil, M., Tsybakov, A. B., and van de Geer, S. 2011. Oracle inequalities and optimal inference under group sparsity. Annals of Statistics, 39(4), 21642204.CrossRefGoogle Scholar
Lovász, L., and Schrijver, A. 1991. Cones of matrices and set-functions and 0 − 1 optimization. SIAM Journal of Optimization, 1, 166190.CrossRefGoogle Scholar
Ma, Z. 2010. Contributions to high-dimensional principal component analysis. Ph.D. thesis, Department of Statistics, Stanford University.Google Scholar
Ma, Z. 2013. Sparse principal component analysis and iterative thresholding. Annals of Statistics, 41(2), 772801.CrossRefGoogle Scholar
Ma, Z., and Wu, Y. 2013. Computational barriers in minimax submatrix detection. arXiv preprint arXiv:1309.5914.Google Scholar
Mackey, L. W., Jordan, M. I., Chen, R. Y., Farrell, B., and Tropp, J. A. 2014. Matrix concentration inequalities via the method of exchangeable pairs. Annals of Probability, 42(3), 906945.CrossRefGoogle Scholar
Mahoney, M. W. 2011. Randomized algorithms for matrices and data. Foundations and Trends in Machine Learning, 3(2), 123224.Google Scholar
Marton, K. 1996a. Bounding d-distance by information divergence: a method to prove measure concentration. Annals of Probability, 24, 857866.CrossRefGoogle Scholar
Marton, K. 1996b. A measure concentration inequality for contracting Markov chains. Geometric and Functional Analysis, 6(3), 556571.CrossRefGoogle Scholar
Marton, K. 2004. Measure concentration for Euclidean distance in the case of dependent random variables. Annals of Probability, 32(3), 25262544.CrossRefGoogle Scholar
Marčenko, V. A., and Pastur, L. A. 1967. Distribution of eigenvalues for some sets of random matrices. Annals of Probability, 4(1), 457483.Google Scholar
Massart, P. 1990. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Annals of Probability, 18, 12691283.CrossRefGoogle Scholar
Massart, P. 2000. Some applications of concentration inequalities to statistics. Annales de la Faculté des Sciences de Toulouse, IX, 245303.Google Scholar
Maurey, B. 1991. Some deviation inequalities. Geometric and Functional Analysis, 1, 188197.CrossRefGoogle Scholar
McDiarmid, C. 1989. On the method of bounded differences. Pages 148–188 of: Surveys in Combinatorics. London Mathematical Society Lecture Notes, no. 141. Cambridge, UK: Cambridge University Press.Google Scholar
Mehta, M. L. 1991. Random Matrices. New York, NY: Academic Press.Google Scholar
Meier, L., van de Geer, S., and Bühlmann, P. 2009. High-dimensional additive modeling. Annals of Statistics, 37, 37793821.CrossRefGoogle Scholar
Meinshausen, N. 2008. A note on the lasso for graphical Gaussian model selection. Statistics and Probability Letters, 78(7), 880884.CrossRefGoogle Scholar
Meinshausen, N., and Bühlmann, P. 2006. High-dimensional graphs and variable selection with the Lasso. Annals of Statistics, 34, 14361462.CrossRefGoogle Scholar
Mendelson, S. 2002. Geometric parameters of kernel machines. Pages 29–43 of: Proceedings of COLT.CrossRefGoogle Scholar
Mendelson, S. 2010. Empirical processes with a bounded ψ1-diameter. Geometric and Functional Analysis, 20(4), 9881027.CrossRefGoogle Scholar
Mendelson, S. 2015. Learning without concentration. Journal of the ACM, 62(3), 125.CrossRefGoogle Scholar
Mendelson, S., Pajor, A., and Tomczak-Jaegermann, N. 2007. Reconstruction of subgaussian operators. Geometric and Functional Analysis, 17(4), 12481282.CrossRefGoogle Scholar
Mézard, M., and Montanari, A. 2008. Information, Physics and Computation. New York, NY: Oxford University Press.Google Scholar
Milman, V., and Schechtman, G. 1986. Asymptotic Theory of Finite Dimensional Normed Spaces. Lecture Notes in Mathematics, vol. 1200. New York, NY: Springer.Google Scholar
Minsker, S. 2011. On some extensions of Bernstein’s inequality for self-adjoint operators. Tech. rept. Duke University.Google Scholar
Mitjagin, B. S. 1961. The approximation dimension and bases in nuclear spaces. Uspekhi. Mat. Naut., 61(16), 63132.Google Scholar
Muirhead, R. J. 2008. Aspects of multivariate statistical theory. Wiley Series in Probability and Mathematical Statistics. New York, NY: Wiley.Google Scholar
Müller, A. 1997. Integral probability metrics and their generating classes of functions. Advances in Applied Probability, 29(2), 429443.CrossRefGoogle Scholar
Negahban, S., and Wainwright, M. J. 2011a. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Annals of Statistics, 39(2), 10691097.CrossRefGoogle Scholar
Negahban, S., and Wainwright, M. J. 2011b. Simultaneous support recovery in high-dimensional regression: Benefits and perils of 1,∞-regularization. IEEE Transactions on Information Theory, 57(6), 34813863.CrossRefGoogle Scholar
Negahban, S., and Wainwright, M. J. 2012. Restricted strong convexity and (weighted) matrix completion: Optimal bounds with noise. Journal of Machine Learning Research, 13(May), 16651697.Google Scholar
Negahban, S., Ravikumar, P., Wainwright, M. J., and Yu, B. 2010 (October). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Tech. rept. UC Berkeley. Arxiv pre-print 1010.2731v1, Version 1.Google Scholar
Negahban, S., Ravikumar, P., Wainwright, M. J., and Yu, B. 2012. A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Statistical Science, 27(4), 538557.CrossRefGoogle Scholar
Nemirovski, A. 2000. Topics in non-parametric statistics. In: Bernard, P. (ed), Ecole d’ Été de Probabilities de Saint-Flour XXVIII. Lecture Notes in Mathematics. Berlin, Germany: Springer.Google Scholar
Nesterov, Y. 1998. Semidefinite relaxation and nonconvex quadratic optimization. Optimization methods and software, 9(1), 141160.CrossRefGoogle Scholar
Netrapalli, P., Banerjee, S., Sanghavi, S., and Shakkottai, S. 2010. Greedy learning of Markov network structure. Pages 1295–1302 of: Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on. IEEE.Google Scholar
Obozinski, G., Wainwright, M. J., and Jordan, M. I. 2011. Union support recovery in high-dimensional multivariate regression. Annals of Statistics, 39(1), 147.CrossRefGoogle Scholar
Oldenburg, D. W., Scheuer, T., and Levy, S. 1983. Recovery of the acoustic impedance from reflection seismograms. Geophysics, 48(10), 13181337.CrossRefGoogle Scholar
Oliveira, R. I. 2010. Sums of random Hermitian matrices and an inequality by Rudelson. Electronic Communications in Probability, 15, 203212.CrossRefGoogle Scholar
Oliveira, R. I. 2013. The lower tail of random quadratic forms, with applicaitons to ordinary least squares and restricted eigenvalue properties. Tech. rept. IMPA, Rio de Janeiro, Brazil.Google Scholar
Ortega, J. M., and Rheinboldt, W. C. 2000. Iterative Solution of Nonlinear Equations in Several Variables. Classics in Applied Mathematics. New York, NY: SIAM.CrossRefGoogle Scholar
Pastur, L. A. 1972. On the spectrum of random matrices. Theoretical and Mathematical Physics, 10, 6774.CrossRefGoogle Scholar
Paul, D. 2007. Asymptotics of sample eigenstructure for a large-dimensional spiked covariance model. Statistica Sinica, 17, 16171642.Google Scholar
Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Morgan Kaufmann.Google Scholar
Petrov, V. V. 1995. Limit theorems of probability theory: Sequence of independent random variables. Oxford, UK: Oxford University Press.Google Scholar
Pilanci, M., and Wainwright, M. J. 2015. Randomized sketches of convex programs with sharp guarantees. IEEE Transactions on Information Theory, 9(61), 50965115.CrossRefGoogle Scholar
Pinkus, A. 1985. N-Widths in Approximation Theory. New York: Springer.CrossRefGoogle Scholar
Pisier, G. 1989. The Volume of Convex Bodies and Banach Space Geometry. Cambridge Tracts in Mathematics, vol. 94. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Pollard, D. 1984. Convergence of Stochastic Processes. New York, NY: Springer.CrossRefGoogle Scholar
Portnoy, S. 1984. Asymptotic behavior of M-estimators of p regression parameters when p2/n is large: I. Consistency. Annals of Statistics, 12(4), 12961309.CrossRefGoogle Scholar
Portnoy, S. 1985. Asymptotic behavior of M-estimators of p regression parameters when p2/n is large: II. Normal approximation. Annals of Statistics, 13(4), 14031417.CrossRefGoogle Scholar
Portnoy, S. 1988. Asymptotic behavior of likelhoood methods for exponential families when the number of parameters tends to infinity. Annals of Statistics, 16(1), 356366.CrossRefGoogle Scholar
Prékopa, A. 1971. Logarithmic concave measures with application to stochastic programming. Acta Scientiarum Mathematicarum (Szeged), 32, 301315.Google Scholar
Prékopa, A. 1973. On logarithmic concave measures and functions. Acta Scientiarum Mathematicarum (Szeged), 33, 335343.Google Scholar
Rachev, S. T., and Ruschendorf, L. 1998. Mass Transportation Problems, Volume II, Applications. New York, NY: Springer.Google Scholar
Rachev, S. T., Klebanov, L., Stoyanov, S. V., and Fabozzi, F. 2013. The Method of Distances in the Theory of Probability and Statistics. New York, NY: Springer.CrossRefGoogle Scholar
Rao, C. R. 1949. On some problems arising out of discrimination with multiple characters. Sankhya (Indian Journal of Statistics), 9(4), 343366.Google Scholar
Raskutti, G., Wainwright, M. J., and Yu, B. 2010. Restricted eigenvalue conditions for correlated Gaussian designs. Journal of Machine Learning Research, 11(August), 22412259.Google Scholar
Raskutti, G., Wainwright, M. J., and Yu, B. 2011. Minimax rates of estimation for high-dimensional linear regression over ℓq-balls. IEEE Transactions on Information Theory, 57(10), 6976—6994.CrossRefGoogle Scholar
Raskutti, G., Wainwright, M. J., and Yu, B. 2012. Minimax-optimal rates for sparse additive models over kernel classes via convex programming. Journal of Machine Learning Research, 12(March), 389427.Google Scholar
Raudys, V., and Young, D. M. 2004. Results in Statistical Discriminant Analysis: A Review of the Former Soviet Union Literature. Journal of Multivariate Analysis, 89(1), 135.CrossRefGoogle Scholar
Ravikumar, P., Liu, H., Lafferty, J. D., and Wasserman, L. A. 2009. SpAM: sparse additive models. Journal of the Royal Statistical Society, Series B, 71(5), 10091030.CrossRefGoogle Scholar
Ravikumar, P., Wainwright, M. J., and Lafferty, J. D. 2010. High-dimensional Ising model selection using ℓ1-regularized logistic regression. Annals of Statistics, 38(3), 12871319.CrossRefGoogle Scholar
Ravikumar, P., Wainwright, M. J., Raskutti, G., and Yu, B. 2011. High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence. Electronic Journal of Statistics, 5, 935980.CrossRefGoogle Scholar
Recht, B. 2011. A Simpler Approach to Matrix Completion. Journal of Machine Learning Research, 12, 34133430.Google Scholar
Recht, B., Xu, W., and Hassibi, B. 2009. Null space conditions and thresholds for rank minimization. Tech. rept. U. Madison. Available at http://pages.cs.wisc.edu/brecht/papers/10.RecXuHas.Thresholds.pdf.Google Scholar
Recht, B., Fazel, M., and Parrilo, P. A. 2010. Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization. SIAM Review, 52(3), 471501.CrossRefGoogle Scholar
Reeves, G., and Gastpar, M. 2008 (July). Sampling Bounds for Sparse Support Recovery in the Presence of Noise. In: International Symposium on Information Theory.CrossRefGoogle Scholar
Reinsel, G. C., and Velu, R. P. 1998. Multivariate Reduced-Rank Regression. Lecture Notes in Statistics, vol. 136. New York, NY: Springer.CrossRefGoogle Scholar
Ren, Z., and Zhou, H. H. 2012. Discussion: Latent variable graphical model selection via convex optimization. Annals of Statistics, 40(4), 19891996.CrossRefGoogle Scholar
Richardson, T., and Urbanke, R. 2008. Modern Coding Theory. Cambridge University Press.CrossRefGoogle Scholar
Rockafellar, R. T. 1970. Convex Analysis. Princeton: Princeton University Press.CrossRefGoogle Scholar
Rohde, A., and Tsybakov, A. B. 2011. Estimation of high-dimensional low-rank matrices. Annals of Statistics, 39(2), 887930.CrossRefGoogle Scholar
Rosenbaum, M., and Tsybakov, A. B. 2010. Sparse recovery under matrix uncertainty. Annals of Statistics, 38, 26202651.CrossRefGoogle Scholar
Rosenthal, H. P. 1970. On the subspaces of p (p > 2) spanned by sequences of independent random variables. Israel Journal of Mathematics, 8, 15461570.CrossRefGoogle Scholar
Rothman, A. J., Bickel, P. J., Levina, E., and Zhu, J. 2008. Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2, 494515.CrossRefGoogle Scholar
Rudelson, M. 1999. Random vectors in the isotropic position. Journal of Functional Analysis, 164, 6072.CrossRefGoogle Scholar
Rudelson, M., and Vershynin, R. 2013. Hanson–Wright inequality and sub-Gaussian concentration. Electronic Communications in Probability, 18(82), 19.CrossRefGoogle Scholar
Rudelson, M., and Zhou, S. 2013. Reconstruction from anisotropic random measurements. IEEE Transactions on Information Theory, 59(6), 34343447.CrossRefGoogle Scholar
Rudin, W. 1964. Principles of Mathematical Analysis. New York, NY: McGraw-Hill.Google Scholar
Rudin, W. 1990. Fourier Analysis on Groups. New York, NY: Wiley-Interscience.CrossRefGoogle Scholar
Samson, P. M. 2000. Concentration of measure inequalities for Markov chains and Φ-mixing processes. Annals of Probability, 28(1), 416461.CrossRefGoogle Scholar
Santhanam, N. P., and Wainwright, M. J. 2012. Information-theoretic limits of selecting binary graphical models in high dimensions. IEEE Transactions on Information Theory, 58(7), 41174134.CrossRefGoogle Scholar
Santosa, F., and Symes, W. W. 1986. Linear inversion of band-limited reflection seismograms. SIAM Journal on Scientific and Statistical Computing, 7(4), 1307—1330.CrossRefGoogle Scholar
Saulis, L., and Statulevicius, V. 1991. Limit Theorems for Large Deviations. London: Kluwer Academic.CrossRefGoogle Scholar
Schölkopf, B., and Smola, A. 2002. Learning with Kernels. Cambridge, MA: MIT Press.Google Scholar
Schütt, C. 1984. Entropy numbers of diagonal operators between symmetric Banach spaces. Journal of Approximation Theory, 40, 121128.CrossRefGoogle Scholar
Scott, D. W. 1992. Multivariate Density Estimation: Theory, Practice and Visualization. New York, NY: Wiley.CrossRefGoogle Scholar
Seijo, E., and Sen, B. 2011. Nonparametric least squares estimation of a multivariate convex regression function. Annals of Statistics, 39(3), 16331657.CrossRefGoogle Scholar
Serdobolskii, V. 2000. Multivariate Statistical Analysis. Dordrecht, The Netherlands: Kluwer Academic.CrossRefGoogle Scholar
Shannon, C. E. 1948. A mathematical theory of communication. Bell System Technical Journal, 27, 379423.CrossRefGoogle Scholar
Shannon, C. E. 1949. Communication in the presence of noise. Proceedings of the IRE, 37(1), 1021.CrossRefGoogle Scholar
Shannon, C. E., and Weaver, W. 1949. The Mathematical Theory of Communication. Urbana, IL: University of Illinois Press.Google Scholar
Shao, J. 2007. Mathematical Statistics. New York, NY: Springer.Google Scholar
Shor, N. Z. 1987. Quadratic optimization problems. Soviet Journal of Computer and System Sciences, 25, 111.Google Scholar
Silverman, B. W. 1982. On the estimation of a probability density function by the maximum penalized likelihood method. Annals of Statistics, 10(3), 795810.CrossRefGoogle Scholar
Silverman, B. W. 1986. Density esitmation for statistics and data analysis. Boca Raton, FL: CRC Press.Google Scholar
Silverstein, J. 1995. Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices. Journal of Multivariate Analysis, 55, 331339.CrossRefGoogle Scholar
Slepian, D. 1962. The one-sided barrier problem for Gaussian noise. Bell System Technical Journal, 42(2), 463501.CrossRefGoogle Scholar
Smale, S., and Zhou, D. X. 2003. Estimating the approximation error in learning theory. Analysis and Its Applications, 1(1), 125.Google Scholar
Spirtes, P., Glymour, C., and Scheines, R. 2000. Causation, Prediction and Search. Cambridge, MA: MIT Press.Google Scholar
Srebro, N. 2004. Learning with Matrix Factorizations. Ph.D. thesis, MIT. Available online: http://ttic.uchicago.edu/nati/Publications/thesis.pdf.Google Scholar
Srebro, N., Rennie, J., and Jaakkola, T. S. 2005a (December 2004). Maximum-margin matrix factorization. In: Advances in Neural Information Processing Systems 17 (NIPS 2004).Google Scholar
Srebro, N., Alon, N., and Jaakkola, T. S. 2005b (December). Generalization error bounds for collaborative prediction with low-rank matrices. In: Advances in Neural Information Processing Systems 17 (NIPS 2004).Google Scholar
Srivastava, N., and Vershynin, R. 2013. Covariance estimation for distributions with 2 + ϵ moments. Annals of Probability, 41, 30813111.CrossRefGoogle Scholar
Steele, J. M. 1978. Empirical discrepancies and sub-additive processes. Annals of Probability, 6, 118127.CrossRefGoogle Scholar
Steinwart, I., and Christmann, A. 2008. Support vector machines. New York, NY: Springer.Google Scholar
Stewart, G. W. 1971. Error bounds for approximate invariant subspaces of closed linear operators. SIAM Journal on Numerical Analysis, 8(4), 796808.CrossRefGoogle Scholar
Stewart, G. W., and Sun, J. 1980. Matrix Perturbation Theory. New York, NY: Academic Press.Google Scholar
Stone, C. J. 1982. Optimal global rates of convergence for non-parametric regression. Annals of Statistics, 10(4), 10401053.CrossRefGoogle Scholar
Stone, C. J. 1985. Additive regression and other non-parametric models. Annals of Statistics, 13(2), 689705.CrossRefGoogle Scholar
Szarek, S. J. 1991. Condition numbers of random matrices. J. Complexity, 7(2), 131149.CrossRefGoogle Scholar
Talagrand, M. 1991. A new isoperimetric inequality and the concentration of measure phenomenon. Pages 94–124 of: Lindenstrauss, J., and Milman, V. D. (eds), Geometric Aspects of Functional Analysis. Lecture Notes in Mathematics, vol. 1469. Berlin, Germany: Springer.Google Scholar
Talagrand, M. 1995. Concentration of measure and isoperimetric inequalities in product spaces. Publ. Math. I.H.E.S., 81, 73205.CrossRefGoogle Scholar
Talagrand, M. 1996a. New concentration inequalities in product spaces. Inventiones Mathematicae, 126, 503563.CrossRefGoogle Scholar
Talagrand, M. 1996b. A new look at independence. Annals of Probability, 24(1), 134.CrossRefGoogle Scholar
Talagrand, M. 2000. The Generic Chaining. New York, NY: Springer.Google Scholar
Talagrand, M. 2003. Spin Glasses: A Challenge for Mathematicians. New York, NY: Springer.Google Scholar
Tibshirani, R. 1996. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58(1), 267288.Google Scholar
Tibshirani, R., Saunders, M. A., Rosset, S., Zhu, J., and Knight, K. 2005. Sparsity and smoothness via the smoothed Lasso. Journal of the Royal Statistical Society B, 67(1), 91108.CrossRefGoogle Scholar
Tropp, J. A. 2006. Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Transactions on Information Theory, 52(3), 10301051.CrossRefGoogle Scholar
Tropp, J. A. 2010 (April). User-friendly tail bounds for matrix martingales. Tech. rept. Caltech.CrossRefGoogle Scholar
Tsybakov, A. B. 2009. Introduction to non-parametric estimation. New York, NY: Springer.CrossRefGoogle Scholar
Turlach, B., Venables, W.N., and Wright, S.J. 2005. Simultaneous variable selection. Technometrics, 27, 349363.CrossRefGoogle Scholar
van de Geer, S. 2000. Empirical Processes in M-Estimation. Cambridge University Press.Google Scholar
van de Geer, S. 2014. Weakly decomposable regularization penalties and structured sparsity. Scandinavian Journal of Statistics, 41, 7286.CrossRefGoogle Scholar
van de Geer, S., and Bühlmann, P. 2009. On the conditions used to prove oracle results for the Lasso. Electronic Journal of Statistics, 3, 13601392.CrossRefGoogle Scholar
van der Vaart, A. W., and Wellner, J. A. 1996. Weak Convergence and Empirical Processes. New York, NY: Springer.CrossRefGoogle Scholar
Vempala, S. 2004. The Random Projection Method. Discrete Mathematics and Theoretical Computer Science. Providence, RI: American Mathematical Society.Google Scholar
Vershynin, R. 2011. Introduction to the non-asymptotic analysis of random matrices. Tech. rept. Univ. Michigan.Google Scholar
Villani, C. 2008. Optimal Transport: Old and New. Grundlehren der mathematischen Wissenschaften, vol. 338. New York, NY: Springer.Google Scholar
Vu, V. Q., and Lei, J. 2012. Minimax rates of estimation for sparse PCA in high dimensions. In: 15th Annual Conference on Artificial Intelligence and Statistics.Google Scholar
Wachter, K. 1978. The strong limits of random matrix spectra for samples matrices of independent elements. Annals of Probability, 6, 118.CrossRefGoogle Scholar
Wahba, G. 1990. Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics. Philadelphia, PN: SIAM.Google Scholar
Wainwright, M. J. 2009a. Information-theoretic bounds on sparsity recovery in the high-dimensional and noisy setting. IEEE Transactions on Information Theory, 55(December), 57285741.CrossRefGoogle Scholar
Wainwright, M. J. 2009b. Sharp thresholds for high-dimensional and noisy sparsity recovery using 1-constrained quadratic programming (Lasso). IEEE Transactions on Information Theory, 55(May), 21832202.CrossRefGoogle Scholar
Wainwright, M. J. 2014. Constrained forms of statistical minimax: Computation, communication and privacy. In: Proceedings of the International Congress of Mathematicians.Google Scholar
Wainwright, M. J., and Jordan, M. I. 2008. Graphical models, exponential families and variational inference. Foundations and Trends in Machine Learning, 1(1–2), 1—305.Google Scholar
Waldspurger, I., d’Aspremont, A., and Mallat, S. 2015. Phase recovery, MaxCut and complex semidefinite programming. Mathematical Programming A, 149(1–2), 4781.CrossRefGoogle Scholar
Wang, T., Berthet, Q., and Samworth, R. J. 2014 (August). Statistical and computational trade-offs in estimation of sparse principal components. Tech. rept. arxiv:1408.5369. University of Cambridge.Google Scholar
Wang, W., Wainwright, M. J., and Ramchandran, K. 2010. Information-theoretic limits on sparse signal recovery: dense versus sparse measurement matrices. IEEE Transactions on Information Theory, 56(6), 29672979.CrossRefGoogle Scholar
Wang, W., Ling, Y., and Xing, E. P. 2015. Collective Support Recovery for Multi-Design Multi-Response Linear Regression. IEEE Transactions on Information Theory, 61(1), 513534.CrossRefGoogle Scholar
Wasserman, L. A. 2006. All of Non-Parametric Statistics. Springer Series in Statistics. New York, NY: Springer.Google Scholar
Widom, H. 1963. Asymptotic behaviour of Eigenvalues of Certain Integral Operators. Transactions of the American Mathematical Society, 109, 278295.CrossRefGoogle Scholar
Widom, H. 1964. Asymptotic behaviour of Eigenvalues of Certain Integral Operators II. Archive for Rational Mechanics and Analysis, 17(3), 215229.CrossRefGoogle Scholar
Wigner, E. 1955. Characteristic vectors of bordered matrices with infinite dimensions. Annals of Mathematics, 62, 548564.CrossRefGoogle Scholar
Wigner, E. 1958. On the distribution of the roots of certain symmetric matrices. Annals of Mathematics, 67, 325327.CrossRefGoogle Scholar
Williams, D. 1991. Probability with Martingales. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Witten, D., Tibshirani, R., and Hastie, T. J. 2009. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biometrika, 10, 515534.Google ScholarPubMed
Woodruff, D. 2014. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10(10), 1157.CrossRefGoogle Scholar
Wright, F. T. 1973. A bound on tail probabilities for quadratic forms in independent random variables whose distributions are not necessarily symmetric. Annals of Probability, 1(6), 10681070.CrossRefGoogle Scholar
Xu, M., Chen, M., and Lafferty, J. D. 2014. Faithful variable selection for high dimensional convex regression. Tech. rept. Univ. Chicago. arxiv:1411.1805.Google Scholar
Xu, Q., and You, J. 2007. Covariate selection for linear errors-in-variables regression models. Communications in Statistics – Theory and Methods, 36(2), 375386.CrossRefGoogle Scholar
Xue, L., and Zou, H. 2012. Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Annals of Statistics, 40(5), 25412571.CrossRefGoogle Scholar
Yang, Y., and Barron, A. 1999. Information-theoretic determination of minimax rates of convergence. Annals of Statistics, 27(5), 15641599.CrossRefGoogle Scholar
Ye, F., and Zhang, C. H. 2010. Rate minimaxity of the Lasso and Dantzig selector for the q-loss in r -balls. Journal of Machine Learning Research, 11, 35193540.Google Scholar
Yu, B. 1996. Assouad, Fano and Le Cam. Research Papers in Probability and Statistics: Festschrift in Honor of Lucien Le Cam, 423–435.Google Scholar
Yuan, M. 2010. High dimensional inverse covariance matrix estimation via linear programming. Journal of Machine Learning Research, 11, 22612286.Google Scholar
Yuan, M., and Lin, Y. 2006. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society B, 1(68), 49.Google Scholar
Yuan, M., and Lin, Y. 2007. Model selection and estimation in the Gaussian graphical model. Biometrika, 94(1), 1935.CrossRefGoogle Scholar
Yuan, X. T., and Zhang, T. 2013. Truncated power method for sparse eigenvalue problems. Journal of Machine Learning Research, 14, 899925.Google Scholar
Yurinsky, V. 1995. Sums and Gaussian Vectors. Lecture Notes in Mathematics. New York, NY: Springer.CrossRefGoogle Scholar
Zhang, C. H. 2012. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38(2), 894942.Google Scholar
Zhang, C. H., and Zhang, T. 2012. A general theory of concave regularization for high-dimensional sparse estimation problems. Statistical Science, 27(4), 576593.CrossRefGoogle Scholar
Zhang, Y., Wainwright, M. J., and Jordan, M. I. 2014 (June). Lower bounds on the performance of polynomial-time algorithms for sparse linear regression. In: Proceedings of the Conference on Learning Theory (COLT). Full length version at http://arxiv.org/abs/1402.1918.Google Scholar
Zhang, Y., Wainwright, M. J., and Jordan, M. I. 2017. Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators. Electronic Journal of Statistics, 11, 752799.CrossRefGoogle Scholar
Zhao, P., and Yu, B. 2006. On model selection consistency of Lasso. Journal of Machine Learning Research, 7, 25412567.Google Scholar
Zhao, P., Rocha, G., and Yu, B. 2009. Grouped and hierarchical model selection through composite absolute penalties. Annals of Statistics, 37(6A), 34683497.CrossRefGoogle Scholar
Zhou, D. X. 2013. Density problem and approximation error in learning theory. Abstract and Applied Analysis, 2013(715683).CrossRefGoogle Scholar
Zou, H. 2006. The Adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 14181429.CrossRefGoogle Scholar
Zou, H., and Hastie, T. J. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67(2), 301320.CrossRefGoogle Scholar
Zou, H., and Li, R. 2008. One-step sparse estimates in nonconcave penalized likelihood models. Annals of Statistics, 36(4), 15091533.Google ScholarPubMed

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×