Skip to main content Accessibility help
×
Hostname: page-component-cd9895bd7-dzt6s Total loading time: 0 Render date: 2024-12-27T10:29:23.050Z Has data issue: false hasContentIssue false

16 - An Introductory Guide to Fano’s Inequality with Applications in Statistical Estimation

Published online by Cambridge University Press:  22 March 2021

Miguel R. D. Rodrigues
Affiliation:
University College London
Yonina C. Eldar
Affiliation:
Weizmann Institute of Science, Israel
Get access

Summary

Information theory plays an indispensable role in the development of algorithm-independent impossibility results, both for communication problems and for seemingly distinct areas such as statistics and machine learning. While numerous information-theoretic tools have been proposed for this purpose, the oldest one remains arguably the most versatile and widespread: Fano’s inequality. In this chapter, we provide a survey of Fano’s inequality and its variants in the context of statistical estimation, adopting a versatile framework that covers a wide range of specific problems. We present a variety of key tools and techniques used for establishing impossibility results via this approach, and provide representative examples covering group testing, graphical model selection, sparse linear regression, density estimation, and convex optimization.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Fano, R. M., “Class notes for MIT course 6.574: Transmission of information,,” 1952.Google Scholar
Malyutov, M. B., “The separating property of random matrices,” Math. Notes Academy Sci. USSR, vol. 23, no. 1, pp. 84–91, 1978.Google Scholar
Atia, G. and Saligrama, V., “Boolean compressed sensing and noisy group testing,” IEEE Trans. Information Theory, vol. 58, no. 3, pp. 1880–1901, 2012.Google Scholar
Wainwright, M. J., “Information-theoretic limits on sparsity recovery in the highdimensional and noisy setting,” IEEE Trans. Information Theory, vol. 55, no. 12, pp. 5728–5741, 2009.Google Scholar
Candès, E. J. and Davenport, M. A., “How well can we estimate a sparse vector?,” Appl. Comput. Harmonic Analysis, vol. 34, no. 2, pp. 317–323, 2013.Google Scholar
Hassanieh, H., Indyk, P., Katabi, D., and Price, E., “Nearly optimal sparse Fourier transform.,” in Proc. 44th Annual ACM Symposium on Theory of Computation, 2012, pp. 563–578.Google Scholar
Cevher, V., Kapralov, M., Scarlett, J., and Zandieh, A., “An adaptive sublinear-time block sparse Fourier transform,,” in Proc. 49th Annual ACM Symposium on Theory of Computation, 2017, pp. 702–715.Google Scholar
Amini, A. A. and Wainwright, M. J., “High-dimensional analysis of semidefinite relaxations for sparse principal components,” Annals Statist., vol. 37, no. 5B, pp. 2877–2921, 2009.Google Scholar
Vu, V. Q. and Lei, J., “Minimax rates of estimation for sparse PCA in high dimensions,,” in Proc. 15th International Conference on Artificial Intelligence and Statistics, 2012, pp. 1278–1286.Google Scholar
Negahban, S. and Wainwright, M. J., “Restricted strong convexity and weighted matrix completion: Optimal bounds with noise,” J. Machine Learning Res., vol. 13, no. 5, pp. 1665–1697, 2012.Google Scholar
Davenport, M. A., Plan, Y., Van Den Berg, E., and Wootters, M., “1-bit matrix completion,” Information and Inference, vol. 3, no. 3, pp. 189–223, 2014.Google Scholar
Ibragimov, I. and Khasminskii, R., “Estimation of infinite-dimensional parameter in Gaussian white noise,” Soviet Math. Doklady, vol. 236, no. 5, pp. 1053–1055, 1977.Google Scholar
Yang, Y. and Barron, A., “Information-theoretic determination of minimax rates of convergence,” Annals Statist., vol. 27, no. 5, pp. 1564–1599, 1999.Google Scholar
Birgé, L., “Approximation dans les espaces métriques et théorie de l'estimation,” Probability Theory and Related Fields, vol. 65, no. 2, pp. 181–237, 1983.Google Scholar
Raskutti, G., Wainwright, M. J., and Yu, B., “Minimax-optimal rates for sparse additive models over kernel classes via convex programming,” J. Machine Learning Res., vol. 13, no. 2, pp. 389–427, 2012.Google Scholar
Yang, Y., Pilanci, M., and Wainwright, M. J., “Randomized sketches for kernels: Fast and optimal nonparametric regression,” Annals Statist., vol. 45, no. 3, pp. 991–1023, 2017.Google Scholar
Zhang, Y., Duchi, J., Jordan, M. I., and Wainwright, M. J., “Information-theoretic lower bounds for distributed statistical estimation with communication constraints,” in Advances in Neural Information Processing Systems, 2013, pp. 2328–2336.Google Scholar
Xu, A. and Raginsky, M., “Information-theoretic lower bounds on Bayes risk in decentralized estimation,” IEEE Trans. Information Theory, vol. 63, no. 3, pp. 1580–1600, 2017.Google Scholar
Duchi, J. C., Jordan, M. I., and Wainwright, M. J., “Local privacy and statistical minimax rates,,” in Proc. 54th Annual IEEE Symposium on Foundations of Computer Science, 2013, pp. 429–438.Google Scholar
Raginsky, M. and Rakhlin, A., “Information-based complexity, feedback and dynamics in convex programimng,” IEEE Trans. Information Theory, vol. 57, no. 10, pp. 7036–7056, 2011.Google Scholar
Agarwal, A., Bartlett, P. L., Ravikumar, P., and Wainwright, M. J., “Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization,” IEEE Trans. Information Theory, vol. 58, no. 5, pp. 3235–3249, 2012.Google Scholar
Raginsky, M. and Rakhlin, A., “Lower bounds for passive and active learning,,” in Advances in Neural Information Processing Systems, 2011, pp. 1026–1034.Google Scholar
Agarwal, A., Agarwal, S., Assadi, S., and Khanna, S., “Learning with limited rounds of adaptivity: Coin tossing, multi-armed bandits, and ranking from pairwise comparisons,,” in Proc. Conference on Learning Theory, 2017, pp. 39–75.Google Scholar
Scarlett, J., “Tight regret bounds for Bayesian optimization in one dimension,,” in Proc. International Conference on Machine Learning, 2018, pp. 4507–4515.Google Scholar
Bar-Yossef, Z., Jayram, T. S., Kumar, R., and Sivakumar, D., “Information theory methods in communication complexity,,” in Proc. 17th IEEE Annual Conference on Computational Complexity, 2002, pp. 93–102.Google Scholar
Santhanam, N. and Wainwright, M., “Information-theoretic limits of selecting binary graphical models in high dimensions,” IEEE Trans. Information Theory, vol. 58, no. 7, pp. 4117–4134, 2012.Google Scholar
Shanmugam, K., Tandon, R., Dimakis, A., and Ravikumar, P., “On the information theoretic limits of learning Ising models,,” in Advances in Neural Information Processing Systems, 2014, pp. 2303–2311.Google Scholar
Shah, N. B. and Wainwright, M. J., “Simple, robust and optimal ranking from pairwise comparisons,” J. Machine Learning Res., vol. 18, no. 199, pp. 1–38, 2018.Google Scholar
Pananjady, A., Mao, C., Muthukumar, V., Wainwright, M. J., and Courtade, T. A., “Worst-case vs average-case design for estimation from fixed pairwise comparisons,” http://arxiv.org/abs/1707.06217.Google Scholar
Yang, Y., “Minimax nonparametric classification. i. rates of convergence,” IEEE Trans. Information Theory, vol. 45, no. 7, pp. 2271–2284, 1999.Google Scholar
Nokleby, M., Rodrigues, M., and Calderbank, R., “Discrimination on the Grassmann manifold: Fundamental limits of subspace classifiers,” IEEE Trans. Information Theory, vol. 61, no. 4, pp. 2133–2147, 2015.Google Scholar
Mazumdar, A. and Saha, B., “Query complexity of clustering with side information,,” in Advances in Neural Information Processing Systems, 2017, pp. 4682–4693.Google Scholar
Mossel, E., “Phase transitions in phylogeny,” Trans. Amer. Math. Soc., vol. 356, no. 6, pp. 2379–2404, 2004.Google Scholar
Cover, T. M. and Thomas, J. A., Elements of information theory. John Wiley & Sons, 2006.Google Scholar
Duchi, J. C. and Wainwright, M. J., “Distance-based and continuum Fano inequalities with applications to statistical estimation,” http://arxiv. org/abs/1311.2669.Google Scholar
Sason, I. and Verdú, S., “f-divergence inequalities,” IEEE Trans. Information Theory, vol. 62, no. 11, pp. 5973–6006, 2016.Google Scholar
Dorfman, R., “The detection of defective members of large populations,” Annals Math. Statist., vol. 14, no. 4, pp. 436–440, 1943.Google Scholar
Scarlett, J. and Cevher, V., “Phase transitions in group testing,,” in Proc. ACM-SIAM Symposium on Discrete Algorithms, 2016, pp. 40–53.Google Scholar
Scarlett, J. and Cevher, V., “How little does non-exact recovery help in group testing?,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2017, pp. 6090–6094.Google Scholar
Baldassini, L., Johnson, O., and Aldridge, M., “The capacity of adaptive group testing,,” in Proc. IEEE Int. Symp. Inform. Theory, 2013, pp. 2676–2680.Google Scholar
Scarlett, J. and Cevher, V., “Converse bounds for noisy group testing with arbitrary measurement matrices,,” in Proc. IEEE International Symposium on Information Theory, 2016, pp. 2868–2872.Google Scholar
Scarlett, J. and Cevher, V., “On the difficulty of selecting Ising models with approximate recovery,” IEEE Trans. Signal Information Processing over Networks, vol. 2, no. 4, pp. 625–638, 2016.Google Scholar
Scarlett, J. and Cevher, V., “Lower bounds on active learning for graphical model selection,,” in Proc. 20th International Conference on Artificial Intelligence and Statistics, 2017.Google Scholar
Tan, V. Y. F., Anandkumar, A., and Willsky, A. S., “Learning high-dimensional Markov forest distributions: Analysis of error rates,” J. Machine Learning Res., vol. 12, no. 5, pp. 1617–1653, 2011.Google Scholar
Anandkumar, A., Tan, V. Y. F., Huang, F., and Willsky, A. S., “High-dimensional structure estimation in Ising models: local separation criterion,” Annals Statist., vol. 40, no. 3, pp. 1346–1375, 2012.Google Scholar
Dasarathy, G., Singh, A., Balcan, M-F., and Park, J. H., “Active learning algorithms for graphical model selection,,” in Proc. 19th International Conference on Artificial Intelligence and Statistics, 2016, pp. 1356–1364.Google Scholar
Yu, B., “Assouad, Fano, and Le Cam,” in Festschrift for Lucien Le Cam. Springer, 1997, pp. 423–435.Google Scholar
Duchi, J., “Lecture notes for statistics 311/electrical engineering 377 (MIT),,” http://stanford.edu/class/stats311/.Google Scholar
Wu, Y., “Lecture notes for ECE598YW: Information-theoretic methods for highdimensional statistics,,” www.stat.yale.edu/~yw562/ln.html.Google Scholar
Polyanskiy, Y., Poor, V., and Verdú, S., “Channel coding rate in the finite blocklength regime,” IEEE Trans. Information Theory, vol. 56, no. 5, pp. 2307–2359, 2010.Google Scholar
Johnson, O., “Strong converses for group testing from finite blocklength results,” IEEE Trans. Information Theory, vol. 63, no. 9, pp. 5923–5933, 2017.Google Scholar
Venkataramanan, R. and Johnson, O., “A strong converse bound for multiple hypothesis testing, with applications to high-dimensional estimation,” Electron. J. Statistics, vol. 12, no. 1, pp. 1126–1149, 2018.Google Scholar
Loh, P.-L., “On lower bounds for statistical learning theory,” Entropy, vol. 19, no. 11, p. 617, 2017.Google Scholar
Lai, T. L. and Robbins, H., “Asymptotically efficient adaptive allocation rules,” Advances Appl. Math., vol. 6, no. 1, pp. 4–22, 1985.Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., and Schapire, R. E., “Gambling in a rigged casino: The adversarial multi-armed bandit problem,,” in Proc. 26th Annual IEEE Conference on Foundations of Computer Science, 1995, pp. 322–331.Google Scholar
Arias-Castro, E., Candès, E. J., and Davenport, M. A., “On the fundamental limits of adaptive sensing,” IEEE Trans. Information Theory, vol. 59, no. 1, pp. 472–481, 2013.Google Scholar
Han, T. S. and Verdú, S., “Generalizing the Fano inequality,” IEEE Trans. Information Theory, vol. 40, no. 4, pp. 1247–1251, 1994.Google Scholar
Birgé, L., “A new lower bound for multiple hypothesis testing,” IEEE Trans. Information Theory, vol. 51, no. 4, pp. 1611–1615, 2005.Google Scholar
Gushchin, A. A., “On Fano’s lemma and similar inequalities for the minimax risk,” Probability Theory and Math. Statistics, vol. 2003, no. 67, pp. 26–37, 2004.Google Scholar
Guntuboyina, A., “Lower bounds for the minimax risk using f-divergences, and applications,” IEEE Trans. Information Theory, vol. 57, no. 4, pp. 2386–2399, 2011.Google Scholar
Polyanskiy, Y. and Verdú, S., “Arimoto channel coding converse and Rényi divergence,,” in Proc. 48th Annual Allerton Conference on Communication, Control, and Compution, 2010, pp. 1327–1333.Google Scholar
Braun, G. and Pokutta, S., “An information diffusion Fano inequality,,” http://arxiv.org/abs/1504.05492.Google Scholar
Chen, X., Guntuboyina, A., and Zhang, Y., “On Bayes risk lower bounds,” J. Machine Learning Res., vol. 17, no. 219, pp. 1–58, 2016.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×