Hostname: page-component-cd9895bd7-gvvz8 Total loading time: 0 Render date: 2024-12-26T03:15:24.530Z Has data issue: false hasContentIssue false

Solomonoff Prediction and Occam’s Razor

Published online by Cambridge University Press:  01 January 2022

Abstract

Algorithmic information theory gives an idealized notion of compressibility that is often presented as an objective measure of simplicity. It is suggested at times that Solomonoff prediction, or algorithmic information theory in a predictive setting, can deliver an argument to justify Occam’s razor. This article explicates the relevant argument and, by converting it into a Bayesian framework, reveals why it has no such justificatory force. The supposed simplicity concept is better perceived as a specific inductive assumption, the assumption of effectiveness. It is this assumption that is the characterizing element of Solomonoff prediction and wherein its philosophical interest lies.

Type
Research Article
Copyright
Copyright © The Philosophy of Science Association

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

For valuable feedback on several versions and presentations of this article, I am indebted to Peter Grünwald, Jan-Willem Romeijn, the members of the Groningen PCCP seminar, Simon Huttegger, Hannes Leitgeb, Samuel Fletcher, Filippo Massari, Teddy Seidenfeld, and an anonymous referee. This research was supported by NWO Vici project 639.073.904.

References

Barron, Andrew R. 1998. “Information-Theoretic Characterization of Bayes Performance and the Choice of Priors in Parametric and Nonparametric Problems.” In Proceedings of the Sixth Valencia International Meeting, ed. Bernardo, José M., Berger, James O., Dawid, A. Philip, and Smith, Adrian F.M., 2752. Oxford: Oxford University Press.Google Scholar
Bernardo, José M., and Smith, Adrian F. M.. 1994. Bayesian Theory. Chichester: Wiley.CrossRefGoogle Scholar
Blackwell, David, and Dubins, Lester. 1962. “Merging of Opinion with Increasing Information.” Annals of Mathematical Statistics 33:882–86.CrossRefGoogle Scholar
Braithwaite, Richard B. 1957. “On Unknown Probabilities.” In Observation and Interpretation: Proceedings of the Ninth Symposium of the Colston Research Society, ed. Körner, S., 311. London: Butterworths.Google Scholar
Carnap, Rudolf. 1945. “On Inductive Logic.” Philosophy of Science 12:7297.CrossRefGoogle Scholar
Carnap, Rudolf 1950. Logical Foundations of Probability. Chicago: University of Chicago Press.Google Scholar
Carnap, Rudolf 1952. The Continuum of Inductive Methods. Chicago: University of Chicago Press.Google Scholar
Cesa-Bianchi, Nicolò, and Lugosi, Gabor. 2006. Prediction, Learning and Games. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Chaitin, Gregory J. 1969. “On the Length of Programs for Computing Finite Binary Sequences: Statistical Considerations.” Journal of the Association for Computing Machinery 16:145–59.CrossRefGoogle Scholar
Dawid, A. Philip. 1984. “Present Position and Potential Developments: Some Personal Views.” Journal of the Royal Statistical Society A 147:278–92.Google Scholar
de Finetti, Bruno. 1937/1937. “La prévision: Ses lois logiques, ses sources subjectives.” Annales de l’Institut Henri Poincaré 7:168. Trans. Henry E. Kyburg Jr. in Studies in Subjective Probability, ed. Henry E. Kyburg Jr. and Howard E. Smokler, 93–158. New York: Wiley.Google Scholar
Downey, Rodney G., and Hirschfeldt, Denis R.. 2010. Algorithmic Randomness and Complexity. New York: Springer.CrossRefGoogle Scholar
Gaifman, Haim, and Snir, Marc. 1982. “Probabilities over Rich Languages, Testing and Randomness.” Journal of Symbolic Logic 47 (3): 495548.CrossRefGoogle Scholar
Goodman, Nelson. 1955. Fact, Fiction, and Forecast. Cambridge, MA: Harvard University Press.Google Scholar
Grünwald, Peter D. 2007. The Minimum Description Length Principle. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Hintikka, Jaakko. 1971. “Unknown Probabilities, Bayesianism, and de Finetti’s Representation Theorem.” In Proceedings of the 1970 Biennial Meeting of the Philosophy of Science Association, ed. Buck, Roger C. and Cohen, Robert S., 325–41. Dordrecht: Reidel.Google Scholar
Howson, Colin. 2000. Hume’s Problem: Induction and the Justification of Belief. New York: Oxford University Press.CrossRefGoogle Scholar
Hutter, Marcus. 2003. “Convergence and Loss Bounds for Bayesian Sequence Prediction.” IEEE Transactions on Information Theory 49 (8): 2061–66.CrossRefGoogle Scholar
Hutter, Marcus 2007. “On Universal Prediction and Bayesian Confirmation.” Theoretical Computer Science 384 (1): 3348.CrossRefGoogle Scholar
Jeffrey, Richard C. 1973. “Carnap’s Inductive Logic.” Synthese 25:299306.CrossRefGoogle Scholar
Kass, Robert E., and Raftery, Adrian E.. 1995. “Bayes Factors.” Journal of the American Statistical Association 90 (420): 773–95.CrossRefGoogle Scholar
Kelly, Kevin T. 1996. The Logic of Reliable Inquiry. New York: Oxford University Press.Google Scholar
Kelly, Kevin T. 2008. “Ockham’s Razor, Truth, and Information.” In Handbook of the Philosophy of Information, ed. van Benthem, Johan F. A. K. and Adriaans, Pieter, 321–60. Dordrecht: Elsevier.Google Scholar
Kolmogorov, Andrey N. 1965. “Three Approaches to the Quantitative Definition of Information.” Problems of Information Transmission 1 (1): 17.Google Scholar
Li, Ming, and Vitányi, Paul M. B.. 2008. An Introduction to Kolmogorov Complexity and Its Applications. 3rd ed. New York: Springer.CrossRefGoogle Scholar
Merhav, Neri, and Feder, Meir. 1998. “Universal Prediction.” IEEE Transactions on Information Theory 44 (8): 2124–47.CrossRefGoogle Scholar
Müller, Markus. 2010. “Stationary Algorithmic Probability.” Theoretical Computer Science 411 (1): 113–30.CrossRefGoogle Scholar
Nies, André. 2009. Computability and Randomness. Oxford: Oxford University Press.CrossRefGoogle Scholar
Ortner, Ronald, and Leitgeb, Hannes. 2011. “Mechanizing Induction.” In Inductive Logic, vol. 10 of Handbook of the History of Logic, ed. Gabbay, Dov M., Hartmann, Stephan, and Woods, John, 719–72. North-Holland: Elsevier.Google Scholar
Piccinini, Gualtiero. 2011. “The Physical Church-Turing Thesis: Modest or Bold?British Journal for the Philosophy of Science 62:733–69.CrossRefGoogle Scholar
Poland, Jan, and Hutter, Marcus. 2005. “Asymptotics of Discrete MDL for Online Prediction.” IEEE Transactions on Information Theory 51 (11): 3780–95.CrossRefGoogle Scholar
Reichenbach, Hans. 1935. Wahrscheinlichkeitslehre. Leiden: Sijthoff.Google Scholar
Rissanen, Jorma J. 1989. Stochastic Complexity in Statistical Inquiry. Singapore: World Scientific.Google Scholar
Romeijn, Jan-Willem. 2004. “Hypotheses and Inductive Predictions.” Synthese 141 (3): 333–64.Google Scholar
Schurz, Gerhard. 2008. “The Meta-inductivist’s Winning Strategy in the Prediction Game: A New Approach to Hume’s Problem.” Philosophy of Science 75:278305.CrossRefGoogle Scholar
Shiryaev, Albert N. 1989. “Kolmogorov: Life and Creative Activities.” Annals of Probability 17 (3): 866944.CrossRefGoogle Scholar
Solomonoff, Raymond J. 1960. “A Preliminary Report on a General Theory of Inductive Inference.” Technical report, Zator, Cambridge, MA.Google Scholar
Solomonoff, Raymond J. 1964. “A Formal Theory of Inductive Inference.” Pts. 1 and 2. Information and Control 7:122, 224–54.CrossRefGoogle Scholar
Solomonoff, Raymond J. 1978. “Complexity-Based Induction Systems: Comparisons and Convergence Theorems.” IEEE Transactions on Information Theory 24 (4): 422–32.CrossRefGoogle Scholar
Solomonoff, Raymond J. 1986. “The Application of Algorithmic Probability to Problems in Artificial Intelligence.” In Uncertainty in Artificial Intelligence, ed. Kanal, Laveen N. and Lemmer, John F., 473–91. Dordrecht: Elsevier.Google Scholar
Lemmer, John F. 1997. “The Discovery of Algorithmic Probability.” Journal of Computer and System Sciences 55 (1): 7388.Google Scholar
Lemmer, John F. 2009. “Algorithmic Probability: Theory and Applications.” In Information Theory and Statistical Learning, ed. Emmert-Streib, Frank and Dehmer, Matthias, 123. New York: Springer.Google Scholar
Stalker, Douglas, ed. 1994. Grue! The New Riddle of Induction. Chicago: Open Court.Google Scholar
Suppes, Patrick. 2002. Representation and Invariance of Scientific Structures. Stanford, CA: CSLI.Google Scholar
Vitányi, Paul M. B. 2005. “Algorithmic Statistics and Kolmogorov’s Structure Functions.” In Advances in Minimum Description Length, ed. Grünwald, Peter D., Myung, In Jae, and Pitt, Mark A., 151–74. Cambridge, MA: MIT Press.Google Scholar
Wallace, Christopher S. 2005. Statistical and Inductive Inference by Minimum Message Length. New York: Springer.Google Scholar
Wood, Ian, Sunehag, Peter, and Hutter, Marcus. 2013. “(Non-)equivalence of Universal Priors.” In Papers from the Ray Solomonoff 85th Memorial Conference, ed. Dowe, David L., 417–25. New York: Springer.Google Scholar
Zabell, Sandy L. 2011. “Carnap and the Logic of Inductive Inference.” In Inductive Logic, vol. 10 of Handbook of the History of Logic, ed. Gabbay, Dov M., Hartmann, Stephan, and Woods, John, 265309. North-Holland: Elsevier.Google Scholar
Zvonkin, Alexander K., and Levin, Leonid A.. 1970. “The Complexity of Finite Objects and the Development of the Concepts of Information and Randomness by Means of the Theory of Algorithms.” Russian Mathematical Surveys 26 (6): 83124.CrossRefGoogle Scholar