Philosophy and Machine Learning
Marcello Pelillo · Joachim M Buhmann · Tiberio Caetano · Bernhard Schölkopf · Larry Wasserman

Sat Dec 17th 07:30 AM -- 08:00 PM @ Melia Sierra Nevada: Hotel Bar
Event URL: »

The fields of machine learning and pattern recognition can arguably be considered as a modern-day incarnation of an endeavor which has challenged mankind since antiquity. In fact, fundamental questions pertaining to categorization, abstraction, generalization, induction, etc., have been on the agenda of mainstream philosophy, under different names and guises, since its inception. With the advent of modern digital computers and the availablity of enormous amount of raw data, these questions have now taken a computational flavor: instead of asking, say, "What is a dog?", we have started asking "How can one recognize a dog?" or, more technically, "What is an algorithm to recognize a dog?". Indeed, it has even been maintained that for a philosophical theory of knowledge to be respectable, it has to be described in computational terms (Thagard, 1988).

As it often happens with scientific research, in the early days of machine learning and pattern recognition there used to be a genuine interest around philosophical and conceptual issues (see, e.g., Minsky, 1961; Sutherland, 1968; Watanabe, 1969; Bongard, 1970; Nelson, 1976; Good, 1983), but over time the interest shifted almost entirely to technical and algorithmic aspects, and became driven mainly by practical applications. With this reality in mind, it is instructive to remark that although the dismissal of philosophical inquiry at times of intense incremental scientific progress is understandable to allow time for the immediate needs of problem-solving, it is also sometimes responsible for preventing or delaying the emergence of true scientific progress (Kuhn, 1962).

There are several points of contact between philosophy, machine learning, and pattern recognition worth exploiting. To begin, as pointed out by Duda, Hart, and Stork (2000), the very foundations of pattern recognition can be traced back to early Greek philosophers who distinguished between an “essential property” from an “accidental property” of an object, so that the whole field of pattern recognition can naturally be cast as the problem of finding such essential properties of a category. As a matter of fact, during the past centuries several varieties of "essentialism" have been put forward, and it is not clear which one, if any, is being used by present-day pattern recognition research (see Gelman, 2003, for a developmental psychology perspective). Interestingly, in modern times, the very essentialist assumption has been vigorously challenged (see, e.g., James, 1890/1983; Wittgenstein, 1953; Rorty, 1979), giving rise to a relativistic position which denies the existence of essences, thereby suggesting a relational view which is reminiscent of modern link-oriented approaches to social network analysis (Kleinberg, 1998; Easley and Kleinberg, 2010) as well to kernel- and purely similarity-based approaches to pattern analysis and recognition (see, e.g., Schölkopf and Smola, 2001; Shawe-Taylor and Cristianini, 2004;

Besides the representation problem alluded to above, another all-important philosophical issue related to the machine learning endeavor concerns the very process of inference, and hence its connections to the philosophy of science. In fact, there are such striking analogies between the two disciplines that it has even been maintained that machine learning should be regarded as "experimental philosophy of science" (Korb, 2004). This is motivated by the observation that at the very heart of both fields there lies the notion of an inductive strategy (by way of algorithms or as they appear in scientific practice), and that the hypothesis choice in science is akin to model selection in machine learning (but see, Williamson, 2009, for a more elaborate position). The connecton with the philosophy of science touches upon such fundamental topics as the foundations of probability (Savage, 1972), Bayesianism and causality (Spirtes, Glymour, and Scheines, 2001; Bovens and Hartmann, 2004; Pearl, 2009; Koller and Friedman, 2009), inductionism vs. falsificationism (Popper, 1959; Lakatos, 1970), etc., each of which is on the agenda of present-day machine learning research.

Other fundamental topics which lie at the intersection of philosophy, machine learning and pattern recognition (and cognitive science as well) include: the nature of similarity and categorization (e.g., Quine, 1969; Goodman, 1972; Tversky, 1977; Lakoff, 1987; Eco, 2000; Hahn and Ramscar, 2001), (causal) decision theory (Lewis, 1981; Skyrms, 1980; Joyce, 1999), game theory (Nozick, 1994; Fudenberg and Levine, 1998; Shafer and Vovk, 2001; Cesa-Bianchi and Lugosi, 2006; Shoham and Leyton-Brown, 2009; Skyrms, 2010), and the nature of information (Watanabe, 1969; Hintikka and Suppes, 1970; Adams, 2003; Skyrms, 2010; Floridi, 2011).

In recent years there has been an increasing interest around the foundational and/or philosophical problems of machine learning and pattern recognition, from both the computer scientist's and the philosopher's camps. We mention, for example, Bob Williamson's project of "reconceiving machine learning" (, the NIPS'09 workshop on "Clustering: Science or art?" ( and the associated manifesto (von Luxburg, Williamson, and Guyon, 2011), the recent MIT Press book by Gilbert Harman (a philosopher) and S. Kulkarni (an engineer) on reliable inductive reasoning (Harman and Kulkarni, 2007), the ECML'2001 workshop on "Machine learning as experimental philosophy of science" ( with the associated special issue of Minds and Machines (vol. 14, no. 4, 2004), the work of P. Thagard on "computational philosophy of science" (Thagard, 1988, 1990), Corfield et al.'s study on the connection between the Popper and the VC-dimension (Corfield, Schölkopf, and Vapnik, 2009), von Luxburg and Schölkopf 's contribution in the Handbook of the History of Logic (von Luxburg and Schölkopf, 2011), Halpern and Pearl's philosophical study on "causes and explanations" (Halpern and Pearl, 2005), and O. Bousquet's blog on "machine learning thoughts" (, to name a few examples.

This suggests that the time is ripe to attempt establishing a long-term dialogue between the philosophy and the machine learning communities with a view to foster cross-fertilization of ideas. In particular, we do feel the present moment is appropriate for reflection, reassessment and eventually some synthesis, with the aim of providing the machine learning field a self-portrait of where it currently stands and where it is going as a whole, and hopefully suggesting new directions. The aim of this workshop is precisely to consolidate research efforts in this area, and to provide an informal discussion forum for researchers and practitioners interested in this important yet diverse subject.

Accordingly, topics of interest include (but are not limited to):

- connections to epistemology and philosophy of science (inductionism, falsificationism, etc)
- essentialism vs anti-essentialism (e.g., feature-based vs similarity/relational approaches)
- foundations of probability and causality (Bayesianism, etc.)
- abstraction and generalization
- connections to decision and game theory
- similarity and categorization
- the nature of information


Adams, F. (2003). The informational turn in philosophy. Minds and Machines 13(4):471–501.

Bongard, M. M. (1970). Pattern Recognition. Spartan Books, New York (original published in Russian in 1967).

Bovens, L., and Hartmann, S. (2004). Bayesian Epistemology. Oxford University Press, Oxford, UK.

Cesa-Bianchi, N., and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press, Cambridge, UK.

Corfield, D., Schölkopf, B., and Vapnik, V. (2009). Falsificationism and statistical learning theory: Comparing the Popper and the Vapnik-Chervonenkis dimensions. J. Gen. Phil. Sci. 40:51-58.

Duda, R. O., Hart, P. E., and Stork, D. G. (2000). Pattern Classification. John Wiley & Sons, New York.

Easley, D., and Kleinberg, J. (2010). Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, Cambridge, UK.

Eco, U. (2000). Kant and the Platypus: Essays on Language and Cognition. Harvest Books.

Floridi, L. (2011). The Philosophy of Information. Oxford University Press, Oxford, UK.

Fudenberg, D., and Levine, D. K. (1998). The Theory of Learning in Games. MIT Press, Cambridge, MA.

Gelman, S. A. (2003). The Essential Child: Origins of Essentialism in Everyday Thought. Oxford University Press, New York.

Good, I. J. (1983). The philosophy of exploratory data analysis. Phil. Sci. 50(2):283-295.

Goodman, N. (1972). Seven strictures on similarity. In: N. Goodman (Ed.), Problems and Projects. Bobs-Merrill, Indianapolis.

Hahn, U., and Ramscar, M. (Eds.) (2001). Similarity and Categorization. Oxford University Press, Oxford, UK.

Halpern, J., and Pearl, J. (2005). Causes and explanations: A structural-model approach. British J. Phil. Sci. 56:843-911.

Harman, G., and Kulkarni, S. (2007). Reliable Reasoning: Induction and Statistical Learning Theory. MIT Press, Cambridge, MA.

Hintikka, J., and Suppes, P. (Eds.) (1970). Information and Inference. Springer, Berlin.

James, W. (1983). The Principles of Psychology. Harvard University Press, Cambridge, MA (Originally published in 1890).

Joyce, J. (1999). The Foundations of Causal Decision Theory. Cambridge University Press, Cambridge, UK.

Kleinberg, J. (1998). Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms.

Koller, D., and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge, UK.

Korb, K. (2004). Introduction: Machine learning as philosophy of science. Minds and Machines 14(4).

Kuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.

Lakatos, I. (1970). Falsification and the methodology of scientific research programmes. In Lakatos, I., and Musgrove, A. (Eds). Criticism and the Growth of Knowledge. Cambridge University Press, Cambridge.

Lakoff, G. (1987). Women, Fire, and Dangerous Things: What Categories Reveal about the Mind. The University of Chicago Press.

Lewis, D. (1981). Causal decision theory. Australasian J. Phil. 59:5–30.

Minsky, M. (1961). Steps toward artificial intelligence. Proc. IRE 49:8-30.

Nelson, R. J. (1976). On mechanical recognition. Phil. Sci. 43(1):24-52.

Nozick, R. (1994). The Nature of Rationality. Princeton University Press, Princeton, NJ.

Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, UK (2nd edition).

Popper, K. R. (1959). The Logic of Scientific Discovery. Hutchinson & Co. (Originally published in German in 1935).

Quine, W. V. O. (1969). Natural kinds. In: Ontological Relativity and Other Essays. Columbia University Press.

Rorty, R. (1979). Philosophy and the Mirror of Nature. Princeton University Press, Princeton, NJ.

Savage, L. (1972). The Foundations of Statistics. Dover, New York (2nd edition).

Schölkopf, B., and Smola, A. (2001). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA.

Shafer, G., and Vovk, V. (2001). Probability and Finance: It's Only a Game. John WIley & Sons, New York.

Shawe-Taylor, J., and Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK.

Shoham, Y., and Leyton-Brown, K. (2009). Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, Cambridge, UK.

Skyrms, B. (1980). Causal Necessity: A Pragmatic Investigation of the Necessity of Laws. Yale University Press, New Haven, CT.

Skyrms, B. (2010). Signals: Evolution, Learning and Information. Oxford University Press, Oxford, UK.

Spirtes, P., Glymour, C., and Scheines, R. (2001). Causation, Prediction, and Search. MIT Press, Cambridge, MA.

Sutherland, N. S. (1968). Outlines of a theory of visual pattern recognition in animals and man. Proc. Royal Soc. B 171:297-317.

Thagard, P. (1988). Computational Philosophy of Science. MIT Press, Cambridge, MA.

Thagard, P. (1990). Philosophy and machine learning. Canad. J. Phil. 20(2):261-276.

Tversky, A. (1977). Features of similarity. Psychol. Rev. 84(4):327-352.

von Luxburg, U., and Schölkopf, B. (2011). Statistical Learning Theory: Models, Concepts, and Results. In: D. Gabbay, S. Hartmann and J. Woods (Eds). Handbook of the History of Logic, vol 10: Inductive Logic. pp. 651-706. Elsevier.

von Luxburg, U., Williamson, R. C., and Guyon, I. (2011). Clustering: Science or art? (

Watanabe, S. (1969). Knowing and Guessing: A Quantitative Study of Inference and Information. John Wiley & Sons, New York.

Williamson, J. (2009). The philosophy of science and its relation to machine learning. In: M. M. Gaber (Ed.), Scientific Data Mining and Knowledge Discovery: Principles and Foundations. Springer, Berlin.

Wittgenstein, L. (1953). Philosophical Investigations. Blackwell Publishers.

Author Information

Marcello Pelillo (Università Ca' Foscari di Venezia)
Joachim M Buhmann (ETH Zurich)
Tiberio Caetano (NICTA Canberra)
Bernhard Schölkopf (MPI for Intelligent Systems)

Bernhard Scholkopf received degrees in mathematics (London) and physics (Tubingen), and a doctorate in computer science from the Technical University Berlin. He has researched at AT&T Bell Labs, at GMD FIRST, Berlin, at the Australian National University, Canberra, and at Microsoft Research Cambridge (UK). In 2001, he was appointed scientific member of the Max Planck Society and director at the MPI for Biological Cybernetics; in 2010 he founded the Max Planck Institute for Intelligent Systems. For further information, see

Larry Wasserman (Carnegie Mellon University)

More from the Same Authors