Skip to main content Accessibility help
×
Hostname: page-component-78c5997874-94fs2 Total loading time: 0 Render date: 2024-11-10T14:55:38.434Z Has data issue: false hasContentIssue false

4 - Information-Theoretic Bounds on Sketching

Published online by Cambridge University Press:  22 March 2021

Miguel R. D. Rodrigues
Affiliation:
University College London
Yonina C. Eldar
Affiliation:
Weizmann Institute of Science, Israel
Get access

Summary

Approximate computation methods with provable performance guarantees are becoming important and relevant tools in practice. In this chapter we focus on sketching methods designed to reduce data dimensionality in computationally intensive tasks. Sketching can often provide better space, time, and communication complexity trade-offs by sacrificing minimal accuracy. This chapter discusses the role of information theory in sketching methods for solving large-scale statistical estimation and optimization problems. We investigate fundamental lower bounds on the performance of sketching. By exploring these lower bounds, we obtain interesting trade-offs in computation and accuracy. We employ Fano’s inequality and metric entropy to understand fundamental lower bounds on the accuracy of sketching, which is parallel to the information-theoretic techniques used in statistical minimax theory.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Vempala, S., The random projection method. American Mathematical Society, 2004.Google Scholar
Candès, E. J. and Tao, T., “Near-optimal signal recovery from random projections: Universal encoding strategies?IEEE Trans. Information Theory, vol. 52, no. 12, pp. 5406–5425, 2006.Google Scholar
Halko, N., Martinsson, P., and Tropp, J. A., “Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions,” SIAM Rev., vol. 53, no. 2, pp. 217–288, 2011.Google Scholar
Mahoney, M. W., Randomized algorithms for matrices and data. Now Publishers, 2011.Google Scholar
Woodruff, D. P., “Sketching as a tool for numerical linear algebra,” Foundations and Trends Theoretical Computer Sci., vol. 10, nos. 1–2, pp. 1–157, 2014.Google Scholar
Muthukrishnan, S., “Data streams: Algorithms and applications,” Foundations and Trends Theoretical Computer Sci., vol. 1, no. 2, pp. 117–236, 2005.CrossRefGoogle Scholar
Yu, B., “Assouad , Fano, and Le Cam,” in Festschrift in Honor of Lucien Le Cam. Springer, 1997, pp. 423–435.Google Scholar
Yang, Y. and Barron, A., “Information-theoretic determination of minimax rates of convergence,” Annals Statist., vol. 27, no. 5, pp. 1564–1599, 1999.CrossRefGoogle Scholar
Dwork, C., McSherry, F., Nissim, K., and Smith, A., “Calibrating noise to sensitivity in private data analysis,” in Proc. Theory of Cryptography Conference, 2006, pp. 265–284.CrossRefGoogle Scholar
Blocki, J., Blum, A., Datta, A., and Sheffet, O., “The Johnson–Lindenstrauss transform itself preserves differential privacy,” in Proc. 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, 2012, pp. 410–419.Google Scholar
Ailon, N. and Chazelle, B., “Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform,” in Proc. 38th Annual ACM Symposium on Theory of Computing, 2006, pp. 557–563.CrossRefGoogle Scholar
Drineas, P. and Mahoney, M. W., “Effective resistances, statistical leverage, and applications to linear equation solving,” arXiv:1005.3097, 2010.Google Scholar
Spielman, D. A. and Srivastava, N., “Graph sparsification by effective resistances,” SIAM J. Computing, vol. 40, no. 6, pp. 1913–1926, 2011.Google Scholar
Charikar, M., Chen, K., and Farach-Colton, M., “Finding frequent items in data streams,” in International Colloquium on Automata, Languages, and Programming, 2002, pp. 693–703.CrossRefGoogle Scholar
Kane, D. M. and Nelson, J., “Sparser Johnson–Lindenstrauss transforms,” J. ACM, vol. 61, no. 1, article no. 4, 2014.Google Scholar
Nelson, J. and Nguyên, H. L., “Osnap : Faster numerical linear algebra algorithms via sparser subspace embeddings,” in Proc. 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS), 2013, pp. 117–126.Google Scholar
Hiriart-Urruty, J. and Lemaréchal, C., Convex analysis and minimization algorithms. Springer, 1993, vol. 1.Google Scholar
Boyd, S. and Vandenberghe, L., Convex optimization. Cambridge University Press, 2004.CrossRefGoogle Scholar
Ledoux, M. and Talagrand, M., Probability in Banach spaces: Isoperimetry and processes. Springer, 1991.Google Scholar
Bartlett, P. L., Bousquet, O., and Mendelson, S., “Local Rademacher complexities,” Annals Statist., vol. 33, no. 4, pp. 1497–1537, 2005.CrossRefGoogle Scholar
Chandrasekaran, V., Recht, B., Parrilo, P. A., and Willsky, A. S., “The convex geometry of linear inverse problems,” Foundations Computational Math., vol. 12, no. 6, pp. 805–849, 2012.Google Scholar
Pilanci, M. and Wainwright, M. J., “Randomized sketches of convex programs with sharp guarantees,” UC Berkeley, Technical Report, 2014, full-length version at arXiv:1404.7203; Presented in part at ISIT 2014.Google Scholar
Pilanci, M. and Wainwright, M. J., “Iterative Hessian sketch: Fast and accurate solution approximation for constrained least-squares,” J. Machine Learning Res., vol. 17, no. 1, pp. 1842–1879, 2016.Google Scholar
Chen, S., Donoho, D. L., and Saunders, M. A., “Atomic decomposition by basis pursuit,” SIAM J. Sci. Computing, vol. 20, no. 1, pp. 33–61, 1998.Google Scholar
Candès, E. J. and Tao, T., “Decoding by linear programming,” IEEE Trans. Information Theory, vol. 51, no. 12, pp. 4203–4215, 2005.Google Scholar
Fano, R. M. and Wintringham, W., “Transmission of information,” Phys. Today, vol. 14, p. 56, 1961.Google Scholar
Cover, T. and Thomas, J., Elements of information theory. John Wiley & Sons, 1991.Google Scholar
Assouad, P., “Deux remarques sur l’estimation,” Comptes Rendus Acad. Sci. Paris, vol. 296, pp. 1021–1024, 1983.Google Scholar
Ibragimov, I. A. and Has’minskii, R. Z., Statistical estimation: Asymptotic theory. Springer, 1981.Google Scholar
Birgé, L., “Estimating a density under order restrictions: Non-asymptotic minimax risk,” Annals Statist., vol. 15, no. 3, pp. 995–1012, 1987.CrossRefGoogle Scholar
Kolmogorov, A. and Tikhomirov, B., “∊-entropy and ∊-capacity of sets in functional spaces,” Uspekhi Mat. Nauk, vol. 86, pp. 3–86, 1959, English transl. Amer. Math. Soc. Translations, vol. 17, pp. 277–364, 1961.Google Scholar
Tibshirani, R., “Regression shrinkage and selection via the Lasso,” J. Roy. Statist. Soc. Ser. B, vol. 58, no. 1, pp. 267–288, 1996.Google Scholar
Raskutti, G., Wainwright, M. J., and Yu, B., “Minimax rates of estimation for highdimensional linear regression over q-balls,” IEEE Trans. Information Theory, vol. 57, no. 10, pp. 6976–6994, 2011.Google Scholar
Srebro, N., Alon, N., and Jaakkola, T. S., “Generalization error bounds for collaborative prediction with low-rank matrices,” in Proc. Advances in Neural Information Processing Systems, 2005, pp. 1321–1328.Google Scholar
Yuan, M. and Lin, Y., “Model selection and estimation in regression with grouped variables,” J. Roy. Statist. Soc. B, vol. 1, no. 68, p. 49, 2006.Google Scholar
Negahban, S. and Wainwright, M. J., “Estimation of (near) low-rank matrices with noise and high-dimensional scaling,” Annals Statist., vol. 39, no. 2, pp. 1069–1097, 2011.Google Scholar
Bunea, F., She, Y., and Wegkamp, M., “Optimal selection of reduced rank estimators of high-dimensional matrices,” Annals Statist., vol. 39, no. 2, pp. 1282–1309, 2011.CrossRefGoogle Scholar
Pilanci, M. and Wainwright, M. J., “Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence,” SIAM J. Optimization, vol. 27, no. 1, pp. 205–245, 2017.CrossRefGoogle Scholar
Weinert, H. L., (ed.), Reproducing kernel hilbert spaces: Applications in statistical signal processing. Hutchinson Ross Publishing Co., 1982.Google Scholar
Schölkopf, B. and Smola, A., Learning with kernels. MIT Press, 2002.Google Scholar
Aronszajn, N., “Theory of reproducing kernels,” Trans. Amer. Math. Soc., vol. 68, pp. 337–404, 1950.Google Scholar
Yang, Y., Pilanci, M., and Wainwright, M. J., “Randomized sketches for kernels: Fast and optimal nonparametric regression,” Annals Statist., vol. 45, no. 3, pp. 991–1023, 2017.Google Scholar
Rahimi, A. and Recht, B., “Random features for large-scale kernel machines,” in Proc. Advances in Neural Information Processing Systems, 2008, pp. 1177–1184.Google Scholar
Rahimi, A. and Recht, B., “Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning,” in Proc. Advances in Neural lnformation Processing Systems, 2009, pp. 1313–1320.Google Scholar
Drineas, P. and Mahoney, M. W., “On the Nyström method for approximating a Gram matrix for improved kernel-based learning,” J. Machine Learning Res., vol. 6, no. 12, pp. 2153–2175, 2005.Google Scholar
Le, Q., Sarlós, T., and Smola, A., “Fastfood –approximating kernel expansions in loglinear time,” in Proc. 30th International Conference on Machine Learning, 2013, 9 unnumbered pages.Google Scholar
Zhou, S., Lafferty, J., and Wasserman, L., “Compressed regression,” IEEE Trans. Information Theory, vol. 55, no. 2, pp. 846–866, 2009.Google Scholar
Candès, E. J., Romberg, J., and Tao, T., “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Information Theory, vol. 52, no. 2, pp. 489–509, 2004.Google Scholar
Clarkson, K. L. and Woodruff, D. P., “Numerical linear algebra in the streaming model,” in Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, 2009, pp. 205–214.Google Scholar
Tibshirani, R., “Regression shrinkage and selection via the lasso,” J. Roy. Statist. Soc. Ser. B (Methodological), vol. 58, no. 1, pp. 267–288, 1996.Google Scholar
Donoho, D. and Tanner, J., “Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing,” Phil. Trans. Roy. Soc. London A: Math., Phys. Engineering Sci., vol. 367, no. 1906, pp. 4273–4293, 2009.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×