Skip to main content Accessibility help
×
Hostname: page-component-78c5997874-g7gxr Total loading time: 0 Render date: 2024-11-10T16:07:45.321Z Has data issue: false hasContentIssue false

1 - Introduction to Information Theory and Data Science.

Published online by Cambridge University Press:  22 March 2021

Miguel R. D. Rodrigues
Affiliation:
University College London
Yonina C. Eldar
Affiliation:
Weizmann Institute of Science, Israel
Get access

Summary

The purpose of this chapter is to set the stage for the book and for the upcoming chapters. We first overview classical information-theoretic problems and solutions. We then discuss emerging applications of information-theoretic methods in various data-science problems and, where applicable, refer the reader to related chapters in the book. Throughout this chapter, we highlight the perspectives, tools, and methods that play important roles in classic information-theoretic paradigms and in emerging areas of data science. Table 1.1 provides a summary of the different topics covered in this chapter and highlights the different chapters that can be read as a follow-up to these topics.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Shannon, C. E., “A mathematical theory of communications,” Bell System Technical J., vol. 27, nos. 3–4, pp. 379–423, 623–656, 1948.CrossRefGoogle Scholar
Gallager, R. G., Information theory and reliable communications. Wiley, 1968.Google Scholar
Berger, T., Rate distortion theory: A mathematical basis for data compression. Prentice-Hall, 1971.Google Scholar
Csiszár, I. and Körner, J., Information theory: Coding theorems for discrete memoryless systems. Cambridge University Press, 2011.Google Scholar
Gersho, A. and Gray, R. M., Vector quantization and signal compression. Kluwer Academic Publishers, 1991.Google Scholar
MacKay, D. J. C., Information theory, inference and learning algorithms. Cambridge University Press, 2003.Google Scholar
Cover, T. M. and Thomas, J. A., Elements of information theory. John Wiley & Sons, 2006.Google Scholar
Yeung, R. W., Information theory and network coding. Springer, 2008.Google Scholar
El Gamal, A. and Kim, Y.-H., Network information theory. Cambridge University Press, 2011.CrossRefGoogle Scholar
Arikan, E., “Some remarks on the nature of the cutoff rate,” in Proc. Workshop Information Theory and Applications (ITA ’06), 2006.Google Scholar
Blahut, R. E., Theory and practice of error control codes. Addison-Wesley Publishing Company, 1983.Google Scholar
Lin, S. and Costello, D. J., Error control coding. Pearson, 2005.Google Scholar
Roth, R. M., Introduction to coding theory. Cambridge University Press, 2006.CrossRefGoogle Scholar
Richardson, T. and Urbanke, R., Modern coding theory. Cambridge University Press, 2008.Google Scholar
Ryan, W. E. and Lin, S., Channel codes: Classical and modern. Cambridge University Press, 2009.Google Scholar
Arikan, E., “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Information Theory, vol. 55, no. 7, pp. 3051–3073, 2009.CrossRefGoogle Scholar
Jiménez-Feltström, A. and Zigangirov, K. S., “Time-varying periodic convolutional codes with low-density parity-check matrix,” IEEE Trans. Information Theory, vol. 45, no. 2, pp. 2181–2191, 1999.Google Scholar
Lentmaier, M., Sridharan, A., Costello, D. J. J., and Zigangirov, K. S., “Iterative decoding threshold analysis for LDPC convolutional codes,” IEEE Trans. Information Theory, vol. 56, no. 10, pp. 5274–5289, 2010.Google Scholar
Kudekar, S., Richardson, T. J., and Urbanke, R. L., “Threshold saturation via spatial coupling: Why convolutional LDPC ensembles perform so well over the BEC,” IEEE Trans. Information Theory, vol. 57, no. 2, pp. 803–834, 2011.Google Scholar
Candès, E. J. and Wakin, M. B., “An introduction to compressive sampling,” IEEE Signal Processing Mag., vol. 25, no. 2, pp. 21–30, 2008.Google Scholar
Ngo, H. Q. and Du, D.-Z., “A survey on combinatorial group testing algorithms with applications to DNA library screening,” Discrete Math. Problems with Medical Appl., vol. 55, pp. 171–182, 2000.Google Scholar
Atia, G. K. and Saligrama, V., “Boolean compressed sensing and noisy group testing,” IEEE Trans. Information Theory, vol. 58, no. 3, pp. 1880–1901, 2012.Google Scholar
Donoho, D. and Tanner, J., “Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing,” Phil. Trans. Roy. Soc. A: Math., Phys. Engineering Sci., pp. 4273–4293, 2009.Google Scholar
Amelunxen, D., Lotz, M., McCoy, M. B., and Tropp, J. A., “Living on the edge: Phase transitions in convex programs with random data,” Information and Inference, vol. 3, no. 3, pp. 224–294, 2014.Google Scholar
Banks, J., Moore, C., Vershynin, R., Verzelen, N., and Xu, J., “Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization,” IEEE Trans. Information Theory, vol. 64, no. 7, pp. 4872–4894, 2018.Google Scholar
Monasson, R., Zecchina, R., Kirkpatrick, S., Selman, B., and Troyansky, L., “Determining computational complexity from characteristic ‘phase transitions,”’ Nature, vol. 400, no. 6740, pp. 133–137, 1999.Google Scholar
Zeng, G. and Lu, Y., “Survey on computational complexity with phase transitions and extremal optimization,” in Proc. 48th IEEE Conf. Decision and Control (CDC ’09), 2009, pp. 4352–4359.Google Scholar
Eldar, Y. C., Sampling theory: Beyond bandlimited systems. Cambridge University Press, 2014.CrossRefGoogle Scholar
Shannon, C. E., “Coding theorems for a discrete source with a fidelity criterion,” IRE National Convention Record, vol. 4, no. 1, pp. 142–163, 1959.Google Scholar
Kipnis, A., Goldsmith, A. J., Eldar, Y. C., and Weissman, T., “Distortion-rate function of sub-Nyquist sampled Gaussian sources,” IEEE Trans. Information Theory, vol. 62, no. 1, pp. 401–429, 2016.CrossRefGoogle Scholar
Kipnis, A., Eldar, Y. C., and Goldsmith, A. J., “Analog-to-digital compression: A new paradigm for converting signals to bits,” IEEE Signal Processing Mag., vol. 35, no. 3, pp. 16–39, 2018.Google Scholar
Kipnis, A., Eldar, Y. C., and Goldsmith, A. J., “Fundamental distortion limits of analogto-digital compression,” IEEE Trans. Information Theory, vol. 64, no. 9, pp. 6013–6033, 2018.Google Scholar
Rodrigues, M. R. D., Deligiannis, N., Lai, L., and Eldar, Y. C., “Rate-distortion trade-offs in acquisition of signal parameters,” in Proc. IEEE International Conference or Acoustics, Speech, and Signal Processing (ICASSP ’17), 2017.Google Scholar
Shlezinger, N., Eldar, Y. C., and Rodrigues, M. R. D., “Hardware-limited task-based quantization,” submitted to IEEE Trans. Signal Processing, accepted 2019.Google Scholar
Shlezinger, N., Eldar, Y. C., and Rodrigues, M. R. D., “Asymptotic task-based quantization with application to massive MIMO,” submitted to IEEE Trans. Signal Processing, accepted 2019.Google Scholar
Argyriou, A., Evgeniou, T., and Pontil, M., “Convex multi-task feature learning,” Machine Learning, vol. 73, no. 3, pp. 243–272, 2008.Google Scholar
Coates, A., Ng, A., and Lee, H., “An analysis of single-layer networks in unsupervised feature learning,” in Proc. 14th International Conference on Artificial Intelligence and Statistics (AISTATS ’11), 2011, pp. 215–223.Google Scholar
Tosic, I. and Frossard, P., “Dictionary learning,” IEEE Signal Processing Mag., vol. 28, no. 2, pp. 27–38, 2011.Google Scholar
Bengio, Y., Courville, A., and Vincent, P., “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.CrossRefGoogle ScholarPubMed
Yu, S., Yu, K., Tresp, V., Kriegel, H.-P., and Wu, M., “Supervised probabilistic principal component analysis,” in Proc. 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’06), 2006, pp. 464–473.Google Scholar
Mairal, J., Bach, F., Ponce, J., Sapiro, G., and Zisserman, A., “Supervised dictionary learning,” in Proc. Advances in Neural Information Processing Systems (NeurIPS ’09), 2009, pp. 1033–1040.Google Scholar
Vu, V. and Lei, J., “Minimax rates of estimation for sparse PCA in high dimensions,” in Proc. 15th International Conference on Artificial Intelligence and Statistics (AISTATS ’12), 2012, pp. 1278–1286.Google Scholar
Cai, T. T., Ma, Z., and Wu, Y., “Sparse PCA: Optimal rates and adaptive estimation,” Annals Statist., vol. 41, no. 6, pp. 3074–3110, 2013.Google Scholar
Jung, A., Eldar, Y. C., and Görtz, N., “On the minimax risk of dictionary learning,” IEEE Trans. Information Theory, vol. 62, no. 3, pp. 1501–1515, 2016.Google Scholar
Shakeri, Z., Bajwa, W. U., and Sarwate, A. D., “Minimax lower bounds on dictionary learning for tensor data,” IEEE Trans. Information Theory, vol. 64, no. 4, 2018.Google Scholar
Hotelling, H., “Analysis of a complex of statistical variables into principal components,” J. Educ. Psychol., vol. 6, no. 24, pp. 417–441, 1933.Google Scholar
Tipping, M. E. and Bishop, C. M., “Probabilistic principal component analysis,” J. Roy. Statist. Soc. Ser. B, vol. 61, no. 3, pp. 611–622, 1999.Google Scholar
Jolliffe, I. T., Principal component analysis, 2nd edn. Springer-Verlag, 2002.Google Scholar
Comon, P., “Independent component analysis: A new concept?,” Signal Processing, vol. 36, no. 3, pp. 287–314, 1994.Google Scholar
Hyvärinen, A., Karhunen, J., and Oja, E., Independent component analysis. John Wiley & Sons, 2004.Google Scholar
Belhumeur, P., Hespanha, J., and Kriegman, D., “Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 19, no. 7, pp. 711–720, 1997.Google Scholar
Ye, J., Janardan, R., and Li, Q., “Two-dimensional linear discriminant analysis,,” in Proc. Advances in Neural Information Processing Systems (NeurIPS ’04), 2004, pp. 1569–1576.Google Scholar
Hastie, T., Tibshirani, R., and Friedman, J., The elements of statistical learning: Data mining, inference, and prediction, 2nd edn. Springer, 2016.Google Scholar
Hyvärinen, A., “Fast and robust fixed-point algorithms for independent component analysis,” IEEE Trans. Neural Networks, vol. 10, no. 3, pp. 626–634, 1999.Google Scholar
Erdogmus, D., Hild, K. E., Rao, Y. N., and Príncipe, J. C., “Minimax mutual information approach for independent component analysis,” Neural Comput., vol. 16, no. 6, pp. 1235– 1252, 2004.CrossRefGoogle ScholarPubMed
Birnbaum, A., Johnstone, I. M., Nadler, B., and Paul, D., “Minimax bounds for sparse PCA with noisy high-dimensional data,” Annals Statist., vol. 41, no. 3, pp. 1055–1084, 2013.Google Scholar
Krauthgamer, R., Nadler, B., and Vilenchik, D., “Do semidefinite relaxations solve sparse PCA up to the information limit?,” Annals Statist., vol. 43, no. 3, pp. 1300–1322, 2015.Google Scholar
Berthet, Q. and Rigollet, P., “Representation learning: A review and new perspectives,” Annals Statist., vol. 41, no. 4, pp. 1780–1815, 2013.Google Scholar
Cai, T., Ma, Z., and Wu, Y., “Optimal estimation and rank detection for sparse spiked covariance matrices,” Probability Theory Related Fields, vol. 161, nos. 3–4, pp. 781–815, 2015.Google Scholar
Onatski, A., Moreira, M., and Hallin, M., “Asymptotic power of sphericity tests for highdimensional data,” Annals Statist., vol. 41, no. 3, pp. 1204–1231, 2013.Google Scholar
Perry, A., Wein, A., Bandeira, A., and Moitra, A., “Optimality and sub-optimality of PCA for spiked random matrices and synchronization,” arXiv:1609.05573, 2016.Google Scholar
Ke, Z., “Detecting rare and weak spikes in large covariance matrices,” arXiv:1609.00883, 2018.Google Scholar
Donoho, D. L. and Grimes, C., “Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data,” Proc. Natl. Acad. Sci. USA, vol. 100, no. 10, pp. 5591–5596, 2003.CrossRefGoogle ScholarPubMed
Tenenbaum, J. B., de Silva, V., and Langford, J. C., “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.Google Scholar
Jenssen, R., “Kernel entropy component analysis,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 32, no. 5, pp. 847–860, 2010.CrossRefGoogle ScholarPubMed
Schölkopf, B., Smola, A., and Müller, K.-R., “Kernel principal component analysis,,” in Proc. Intl. Conf. Artificial Neural Networks (ICANN ’97), 1997, pp. 583–588.Google Scholar
Yang, J., Gao, X., Zhang, D., and Yang, J.-Y., “Kernel ICA: An alternative formulation and its application to face recognition,” Pattern Recognition, vol. 38, no. 10, pp. 1784–1787, 2005.CrossRefGoogle Scholar
Mika, S., Ratsch, G., Weston, J., Schölkopf, B., and Mullers, K. R., “Fisher discriminant analysis with kernels,” in Proc. IEEE Workshop Neural Networks for Signal Processing IX, 1999, pp. 41–48.Google Scholar
Narayanan, H. and Mitter, S., “Sample complexity of testing the manifold hypothesis,” in Proc. Advances in Neural Information Processing Systems (NeurIPS ’10), 2010, pp. 1786–1794.Google Scholar
Kreutz-Delgado, K., Murray, J. F., Rao, B. D., Engan, K., Lee, T.-W., and Sejnowski, T. J., “Dictionary learning algorithms for sparse representation,” Neural Comput., vol. 15, no. 2, pp. 349–396, 2003.Google Scholar
Aharon, M., Elad, M., and Bruckstein, A., “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006.Google Scholar
Zhang, Q. and Li, B., “Discriminative K-SVD for dictionary learning in face recognition,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’10), 2010, pp. 2691–2698.Google Scholar
Geng, Q. and Wright, J., “On the local correctness of l1-minimization for dictionary learning,” in Proc. IEEE International Symposium on Information Theory (ISIT ’14), 2014, pp. 3180–3184.Google Scholar
Agarwal, A., Anandkumar, A., Jain, P., Netrapalli, P., and Tandon, R., “Learning sparsely used overcomplete dictionaries,” in Proc. 27th Conference on Learning Theory (COLT ’14), 2014, pp. 123–137.Google Scholar
Arora, S., Ge, R., and Moitra, A., “New algorithms for learning incoherent and overcomplete dictionaries,” in Proc. 27th Conference on Learning Theory (COLT ’14), 2014, pp. 779–806.Google Scholar
Gribonval, R., Jenatton, R., and Bach, F., “Sparse and spurious: Dictionary learning with noise and outliers,” IEEE Trans. Information Theory, vol. 61, no. 11, pp. 6298–6319, 2015.Google Scholar
Lee, D. D. and Seung, H. S., “Algorithms for non-negative matrix factorization,” in Proc. Advances in Neural Information Processing Systems 13 (NeurIPS ’01), 2001, pp. 556–562.Google Scholar
Cichocki, A., Zdunek, R., Phan, A. H., and Amari, S.-I., Nonnegative matrix and tensor factorizations: Applications to exploratory multi-way data analysis and blind source separation. John Wiley & Sons, 2009.CrossRefGoogle Scholar
Alsan, M., Liu, Z., and Tan, V. Y. F., “Minimax lower bounds for nonnegative matrix factorization,” in Proc. IEEE Statistical Signal Processing Workshop (SSP ’18), 2018, pp. 363–367.Google Scholar
LeCun, Y., Bengio, Y., and Hinton, G., “Deep learning,” Nature, vol. 521, pp. 436–444, 2015.Google Scholar
Goodfellow, I., Bengio, Y., and Courville, A., Deep learning. MIT Press, 2016, www. deeplearningbook.org.Google Scholar
Tishby, N. and Zaslavsky, N., “Deep learning and the information bottleneck principle,” in Proc. IEEE Information Theory Workshop (ITW ’15), 2015.Google Scholar
Shwartz-Ziv, R. and Tishby, N., “Opening the black box of deep neural networks via information,” arXiv:1703.00810, 2017.Google Scholar
Huang, C. W. and Narayanan, S. S., “Flow of Rényi information in deep neural networks,” in Proc. IEEE International Workshop Machine Learning for Signal Processing (MLSP ’16), 2016.Google Scholar
Khadivi, P., Tandon, R., and Ramakrishnan, N., “Flow of information in feed-forward deep neural networks,” arXiv:1603.06220, 2016.Google Scholar
Yu, S., Jenssen, R., and Príncipe, J., “Understanding convolutional neural network training with information theory,” arXiv:1804.09060, 2018.Google Scholar
Yu, S. and Príncipe, J., “Understanding autoencoders with information theoretic concepts,” arXiv:1804.00057, 2018.Google Scholar
Achille, A. and Soatto, S., “Emergence of invariance and disentangling in deep representations,” arXiv:1706.01350, 2017.Google Scholar
Hjelm, R. D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., and Bengio, Y., “Learning deep representations by mutual information estimation and maximization,” in International Conference on Learning Representations (ICLR ’19), 2019.Google Scholar
Shalev-Shwartz, S. and Ben-David, S., Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.Google Scholar
Akaike, H., “A new look at the statistical model identification,” IEEE Trans. Automation Control, vol. 19, no. 6, pp. 716–723, 1974.Google Scholar
Barron, A., Rissanen, J., and Yu, B., “The minimum description length principle in coding and modeling,” IEEE Trans. Information Theory, vol. 44, no. 6, pp. 2743–2760, 1998.Google Scholar
Wainwright, M. J., “Information-theoretic limits on sparsity recovery in the highdimensional and noisy setting,” IEEE Trans. Information Theory, vol. 55, no. 12, pp. 5728–5741, 2009.Google Scholar
Wainwright, M. J., “Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1-constrained quadratic programming (lasso),” IEEE Trans. Information Theory, vol. 55, no. 5, pp. 2183–2202, 2009.Google Scholar
Raskutti, G., Wainwright, M. J., and Yu, B., “Minimax rates of estimation for highdimensional linear regression over ℓq-balls,” IEEE Trans. Information Theory, vol. 57, no. 10, pp. 6976–6994, 2011.Google Scholar
Guo, D., Shamai, S., and Verdú, S., “Mutual information and minimum mean-square error in Gaussian channels,” IEEE Trans. Information Theory, vol. 51, no. 4, pp. 1261–1282, 2005.Google Scholar
Guo, D., Shamai, S., and Verdú, S., “Mutual information and conditional mean estimation in Poisson channels,” IEEE Trans. Information Theory, vol. 54, no. 5, pp. 1837–1849, 2008.Google Scholar
Lozano, A., Tulino, A. M., and Verdú, S., “Optimum power allocation for parallel Gaussian channels with arbitrary input distributions,” IEEE Trans. Information Theory, vol. 52, no. 7, pp. 3033–3051, 2006.Google Scholar
Pérez-Cruz, F., Rodrigues, M. R. D., and Verdú, S., “Multiple-antenna fading channels with arbitrary inputs: Characterization and optimization of the information rate,” IEEE Trans. Information Theory, vol. 56, no. 3, pp. 1070–1084, 2010.Google Scholar
Rodrigues, M. R. D., “Multiple-antenna fading channels with arbitrary inputs: Characterization and optimization of the information rate,” IEEE Trans. Information Theory, vol. 60, no. 1, pp. 569–585, 2014.Google Scholar
A. G. C. P. Ramos and Rodrigues, M. R. D., “Fading channels with arbitrary inputs: Asymptotics of the constrained capacity and information and estimation measures,” IEEE Trans. Information Theory, vol. 60, no. 9, pp. 5653–5672, 2014.Google Scholar
Kay, S. M., Fundamentals of statistical signal processing: Detection theory. Prentice Hall, 1998.Google Scholar
Feder, M. and Merhav, N., “Relations between entropy and error probability,” IEEE Trans. Information Theory, vol. 40, no. 1, pp. 259–266, 1994.Google Scholar
Sason, I. and Verdú, S., “Arimoto–Rényi conditional entropy and Bayesian M-ary hypothesis testing,” IEEE Trans. Information Theory, vol. 64, no. 1, pp. 4–25, 2018.Google Scholar
Polyanskiy, Y., Poor, H. V., and Verdú, S., “Channel coding rate in the finite blocklength regime,” IEEE Trans. Information Theory, vol. 56, no. 5, pp. 2307–2359, 2010.Google Scholar
Vazquez-Vilar, G., Campo, A. T., Guillén i Fàbregas, A., and Martinez, A., “Bayesian Mary hypothesis testing: The meta-converse and Verdú–Han bounds are tight,” IEEE Trans. Information Theory, vol. 62, no. 5, pp. 2324–2333, 2016.Google Scholar
Venkataramanan, R. and Johnson, O., “A strong converse bound for multiple hypothesis testing, with applications to high-dimensional estimation,” Electron. J. Statist, vol. 12, no. 1, pp. 1126–1149, 2018.CrossRefGoogle Scholar
Abbe, E., “Community detection and stochastic block models: Recent developments,” J. Machine Learning Res., vol. 18, pp. 1–86, 2018.Google Scholar
Hajek, B., Wu, Y., and Xu, J., “Computational lower bounds for community detection on random graphs,” in Proc. 28th Conference on Learning Theory (COLT ’15), Paris, 2015, pp. 1–30.Google Scholar
Vapnik, V. N., “An overview of statistical learning theory,” IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 988–999, 1999.Google Scholar
Bousquet, O. and Elisseeff, A., “Stability and generalization,” J. Machine Learning Res., vol. 2, pp. 499–526, 2002.Google Scholar
Xu, H. and Mannor, S., “Robustness and generalization,” Machine Learning, vol. 86, no. 3, pp. 391–423, 2012.Google Scholar
McAllester, D. A., “PAC-Bayesian stochastic model selection,” Machine Learning, vol. 51, pp. 5–21, 2003.Google Scholar
Russo, D. and Zou, J., “How much does your data exploration overfit? Controlling bias via information usage,” arXiv:1511.05219, 2016.Google Scholar
Xu, A. and Raginsky, M., “Information -theoretic analysis of generalization capability of learning algorithms,” in Proc. Advances in Neural Information Processing Systems (NeurIPS ’17), 2017.Google Scholar
Raginsky, M., Rakhlin, A., Tsao, M., Wu, Y., and Xu, A., “Information -theoretic analysis of stability and bias of learning algorithms,” in Proc. IEEE Information Theory Workshop (ITW ’16), 2016.Google Scholar
Bassily, R., Moran, S., Nachum, I., Shafer, J., and Yehudayof, A., “Learners that use little information,” arXiv:1710.05233, 2018.Google Scholar
Asadi, A. R., Abbe, E., and Verdú, S., “Chaining mutual information and tightening generalization bounds,” arXiv:1806.03803, 2018.Google Scholar
Pensia, A., Jog, V., and Loh, P. L., “Generalization error bounds for noisy, iterative algorithms,” arXiv:1801.04295v1, 2018.Google Scholar
Zhang, J., Liu, T., and Tao, D., “An information-theoretic view for deep learning,” arXiv:1804.09060, 2018.Google Scholar
Vera, M., Piantanida, P., and Vega, L. R., “The role of information complexity and randomization in representation learning,” arXiv:1802.05355, 2018.Google Scholar
Vera, M., Vega, L. R., and Piantanida, P., “Compression -based regularization with an application to multi-task learning,arXiv:1711.07099, 2018.Google Scholar
Chan, C., Al-Bashadsheh, A., and Zhou, Q., “Info-clustering: A mathematical theory of data clustering,” IEEE Trans. Mol. Biol. Multi-Scale Commun., vol. 2, no. 1, pp. 64–91, 2016.Google Scholar
Raman, R. K. and Varshney, L. R., “Universal joint image clustering and registration using multivariate information measures,” IEEE J. Selected Topics Signal Processing, vol. 12, no. 5, pp. 928–943, 2018.Google Scholar
Zhang, Z. and Berger, T., “Estimation via compressed information,” IEEE Trans. Information Theory, vol. 34, no. 2, pp. 198–211, 1988.Google Scholar
Han, T. S. and Amari, S., “Parameter estimation with multiterminal data compression,” IEEE Trans. Information Theory, vol. 41, no. 6, pp. 1802–1833, 1995.Google Scholar
Zhang, Y., Duchi, J. C., Jordan, M. I., and Wainwright, M. J., “Information -theoretic lower bounds for distributed statistical estimation with communication constraints,” in Proc. Advances in Neural Information Processing Systems (NeurIPS ’13), 2013.Google Scholar
Ahlswede, R. and Csiszár, I., “Hypothesis testing with communication constraints,” IEEE Trans. Information Theory, vol. 32, no. 4, pp. 533–542, 1986.Google Scholar
Han, T. S., “Hypothesis testing with multiterminal data compression,” IEEE Trans. Information Theory, vol. 33, no. 6, pp. 759–772, 1987.Google Scholar
Han, T. S. and Kobayashi, K., “Exponential-type error probabilities for multiterminal hypothesis testing,” IEEE Trans. Information Theory, vol. 35, no. 1, pp. 2–14, 1989.Google Scholar
Han, T. S. and Amari, S., “Statistical inference under multiterminal data compression,” IEEE Trans. Information Theory, vol. 44, no. 6, pp. 2300–2324, 1998.Google Scholar
Shalaby, H. M. H. and Papamarcou, A., “Multiterminal detection with zero-rate data compression,” IEEE Trans. Information Theory, vol. 38, no. 2, pp. 254–267, 1992.Google Scholar
Katz, G., Piantanida, P., Couillet, R., and Debbah, M., “On the necessity of binning for the distributed hypothesis testing problem,” in Proc. IEEE International Symposium on Information Theory (ISIT ’15), 2015.Google Scholar
Xiang, Y. and Kim, Y., “Interactive hypothesis testing against independence,” in Proc. IEEE International Symposium on Information Theory (ISIT ’13), 2013.Google Scholar
Zhao, W. and Lai, L., “Distributed testing against independence with conferencing encoders,” in Proc. IEEE Information Theory Workshop (ITW ’15), 2015.Google Scholar
Zhao, W. and Lai, L., “Distributed testing with zero-rate compression,” in Proc. IEEE International Symposium on Information Theory (ISIT ’15), 2015.Google Scholar
Zhao, W. and Lai, L., “Distributed detection with vector quantizer,” IEEE Trans. Signal Information Processing Networks, vol. 2, no. 2, pp. 105–119, 2016.Google Scholar
Zhao, W. and Lai, L., “Distributed testing with cascaded encoders,” IEEE Trans. Information Theory, vol. 64, no. 11, pp. 7339–7348, 2018.Google Scholar
Raginsky, M., “Learning from compressed observations,” in Proc. IEEE Information Theory Workshop (ITW ’07), 2007.Google Scholar
Raginsky, M., “Achievability results for statistical learning under communication constraints,” in Proc. IEEE International Symposium on Information Theory (ISIT ’09), 2009.Google Scholar
Xu, A. and Raginsky, M., “Information-theoretic lower bounds for distributed function computation,” IEEE Trans. Information Theory, vol. 63, no. 4, pp. 2314–2337, 2017.Google Scholar
Dwork, C. and Roth, A., “The algorithmic foundations of differential privacy,” Foundations and Trends Theoretical Computer Sci., vol. 9, no. 3–4, pp. 211–407, 2014.Google Scholar
Liao, J., Sankar, L., Tan, V. Y. F., and Calmon, F. P., “Hypothesis testing under mutual information privacy constraints in the high privacy regime,” IEEE Trans. Information Forensics Security, vol. 13, no. 4, pp. 1058–1071, 2018.Google Scholar
Calmon, F. P., Wei, D., Vinzamuri, B., Ramamurthy, K. N., and Varshney, K. R., “Data pre-processing for discrimination prevention: Information-theoretic optimization and analysis,” IEEE J. Selected Topics Signal Processing, vol. 12, no. 5, pp. 1106–1119, 2018.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×