Book contents
- Frontmatter
- Dedication
- Contents
- Preface
- Notation
- Contributors
- 1 Introduction to Information Theory and Data Science.
- 2 An Information-Theoretic Approach to Analog-to-Digital Compression
- 3 Compressed Sensing via Compression Codes
- 4 Information-Theoretic Bounds on Sketching
- 5 Sample Complexity Bounds for Dictionary Learning from Vector- and Tensor-Valued Data
- 6 Uncertainty Relations and Sparse Signal Recovery
- 7 Understanding Phase Transitions via Mutual Information and MMSE
- 8 Computing Choice: Learning Distributions over Permutations
- 9 Universal Clustering
- 10 Information-Theoretic Stability and Generalization
- 11 Information Bottleneck and Representation Learning
- 12 Fundamental Limits in Model Selection for Modern Data Analysis
- 13 Statistical Problems with Planted Structures: Information-Theoretical and Computational Limits
- 14 Distributed Statistical Inference with Compressed Data
- 15 Network Functional Compression
- 16 An Introductory Guide to Fano’s Inequality with Applications in Statistical Estimation
- Index
- References
11 - Information Bottleneck and Representation Learning
Published online by Cambridge University Press: 22 March 2021
- Frontmatter
- Dedication
- Contents
- Preface
- Notation
- Contributors
- 1 Introduction to Information Theory and Data Science.
- 2 An Information-Theoretic Approach to Analog-to-Digital Compression
- 3 Compressed Sensing via Compression Codes
- 4 Information-Theoretic Bounds on Sketching
- 5 Sample Complexity Bounds for Dictionary Learning from Vector- and Tensor-Valued Data
- 6 Uncertainty Relations and Sparse Signal Recovery
- 7 Understanding Phase Transitions via Mutual Information and MMSE
- 8 Computing Choice: Learning Distributions over Permutations
- 9 Universal Clustering
- 10 Information-Theoretic Stability and Generalization
- 11 Information Bottleneck and Representation Learning
- 12 Fundamental Limits in Model Selection for Modern Data Analysis
- 13 Statistical Problems with Planted Structures: Information-Theoretical and Computational Limits
- 14 Distributed Statistical Inference with Compressed Data
- 15 Network Functional Compression
- 16 An Introductory Guide to Fano’s Inequality with Applications in Statistical Estimation
- Index
- References
Summary
A grand challenge in representation learning is the development of computational algorithms that learn the explanatory factors of variation behind high-dimensional data. Representation models (encoders) are often determined for optimizing performance on training data when the real objective is to generalize well to other (unseen) data. This chapter provides an overview of fundamental concepts in statistical learning theory and the information-bottleneck principle. This serves as a mathematical basis for the technical results, in which an upper bound to the generalization gap corresponding to the cross-entropy risk is given. When this penalty term times a suitable multiplier and the cross-entropy empirical risk are minimized jointly, the problem is equivalent to optimizing the information-bottleneck objective with respect to the empirical data distribution. This result provides an interesting connection between mutual information and generalization, and helps to explain why noise injection during the training phase can improve the generalization ability of encoder models and enforce invariances in the resulting representations.
Keywords
- Type
- Chapter
- Information
- Information-Theoretic Methods in Data Science , pp. 330 - 358Publisher: Cambridge University PressPrint publication year: 2021
References
- 1
- Cited by