Book contents
- Frontmatter
- Dedication
- Contents
- Preface
- Notation
- Contributors
- 1 Introduction to Information Theory and Data Science.
- 2 An Information-Theoretic Approach to Analog-to-Digital Compression
- 3 Compressed Sensing via Compression Codes
- 4 Information-Theoretic Bounds on Sketching
- 5 Sample Complexity Bounds for Dictionary Learning from Vector- and Tensor-Valued Data
- 6 Uncertainty Relations and Sparse Signal Recovery
- 7 Understanding Phase Transitions via Mutual Information and MMSE
- 8 Computing Choice: Learning Distributions over Permutations
- 9 Universal Clustering
- 10 Information-Theoretic Stability and Generalization
- 11 Information Bottleneck and Representation Learning
- 12 Fundamental Limits in Model Selection for Modern Data Analysis
- 13 Statistical Problems with Planted Structures: Information-Theoretical and Computational Limits
- 14 Distributed Statistical Inference with Compressed Data
- 15 Network Functional Compression
- 16 An Introductory Guide to Fano’s Inequality with Applications in Statistical Estimation
- Index
- References
12 - Fundamental Limits in Model Selection for Modern Data Analysis
Published online by Cambridge University Press: 22 March 2021
- Frontmatter
- Dedication
- Contents
- Preface
- Notation
- Contributors
- 1 Introduction to Information Theory and Data Science.
- 2 An Information-Theoretic Approach to Analog-to-Digital Compression
- 3 Compressed Sensing via Compression Codes
- 4 Information-Theoretic Bounds on Sketching
- 5 Sample Complexity Bounds for Dictionary Learning from Vector- and Tensor-Valued Data
- 6 Uncertainty Relations and Sparse Signal Recovery
- 7 Understanding Phase Transitions via Mutual Information and MMSE
- 8 Computing Choice: Learning Distributions over Permutations
- 9 Universal Clustering
- 10 Information-Theoretic Stability and Generalization
- 11 Information Bottleneck and Representation Learning
- 12 Fundamental Limits in Model Selection for Modern Data Analysis
- 13 Statistical Problems with Planted Structures: Information-Theoretical and Computational Limits
- 14 Distributed Statistical Inference with Compressed Data
- 15 Network Functional Compression
- 16 An Introductory Guide to Fano’s Inequality with Applications in Statistical Estimation
- Index
- References
Summary
With rapid development in hardware storage, precision instrument manufacturing, and economic globalization etc., data in various forms have become ubiquitous in human life. This enormous amount of data can be a double-edged sword. While it provides the possibility of modeling the world with a higher fidelity and greater flexibility, improper modeling choices can lead to false discoveries, misleading conclusions, and poor predictions. Typical data-mining, machine-learning, and statistical-inference procedures learn from and make predictions on data by fitting parametric or non-parametric models. However, there exists no model that is universally suitable for all datasets and goals. Therefore, a crucial step in data analysis is to consider a set of postulated candidate models and learning methods (the model class) and select the most appropriate one. We provide integrated discussions on the fundamental limits of inference and prediction based on model-selection principles from modern data analysis. In particular, we introduce two recent advances of model-selection approaches, one concerning a new information criterion and the other concerning modeling procedure selection.
Keywords
- Type
- Chapter
- Information
- Information-Theoretic Methods in Data Science , pp. 359 - 382Publisher: Cambridge University PressPrint publication year: 2021
References
- 1
- Cited by