Induction and Hume's Problem: The Foundational Challenge

The Puzzle

The sun has risen every day in recorded history. Bread has nourished every healthy person who has ever eaten it. Iron has corroded in the presence of oxygen and water under the conditions we have observed. Every emerald examined to date has been green. Each of these is a past pattern. From each, we are inclined to infer a future case: the sun will rise tomorrow, bread will continue to nourish us, iron will continue to corrode, the next emerald examined will be green.

Such inferences are inductive. They go beyond the evidence: the past pattern does not deductively entail the future case. There is no logical contradiction in the sun failing to rise tomorrow, in bread suddenly being toxic, or in the next emerald being blue. The inference is non-deductive. It must rest on some additional principle.

David Hume's question, posed in A Treatise of Human Nature (1739) and again in An Enquiry Concerning Human Understanding (1748), is sharper than it sounds:

What is the rational justification for the inductive principle?

Hume's answer was: there isn't one. Any justification for induction must either be deductive or inductive. Deductive justification is impossible (no contradiction follows from the future differing from the past). Inductive justification is circular (induction has worked in the past, therefore it will work in the future, uses induction in the premise). The argument has not been refuted in 280 years. It has been responded to in different ways, but not refuted.

This is Hume's problem of induction, sometimes called the problem of induction without qualifier. It is the deepest single challenge in the philosophy of empirical knowledge.

This page assumes What Is Epistemology? and What Is Logic?. It is also the philosophical background for the Empiricism essay in this series; the formal version of the problem appears in the Empiricism, Induction, and the Limits of LLM Generalization essay, which applies it directly to machine learning.

Reconstructed Formally

Hume's argument is most clearly presented as a numbered argument over two horns of a dilemma.

P1. Any justification of induction must be either demonstrative (deductive) or probable (inductive).

P2. No demonstrative justification works: from the premise that some past instances of $A$ have been followed by $B$ , no deductive inference licenses the future. The proposition "the future will resemble the past" is not a logical truth; its negation involves no contradiction.

P3. No probable justification works: any inductive justification of induction would itself rely on inferring future success from past success, which is exactly the inductive principle in question. Such a justification is circular.

P4. Therefore (from P1, P2, P3) induction has no rational justification.

C. Inductive belief is therefore not the product of reason. It is the product of custom or habit: the mind's natural tendency, after observing constant conjunction, to expect the second event when the first occurs.

Hume does not conclude that we should not engage in induction. He concludes that induction is not justified by reason; it is a feature of human cognition rather than an exercise of rational warrant. The naturalistic conclusion: empirical reasoning is psychologically inevitable but not epistemically licensed in the way deductive reasoning is.

The structure deserves emphasis. Hume's argument is deductively valid. The challenge to it must come from one of the premises. Each major response in modern philosophy attempts to weaken or reinterpret one of P1, P2, P3.

Why the Argument Bites

Three concrete cases.

Case 1: a calibrated forecast. A forecaster predicts that the next coin flip will land heads with probability 0.5. The prediction relies on a long history of fair-coin behavior, and the forecaster's calibration on past flips has been excellent. But the prediction depends on assuming that the conditions producing past 0.5-frequency outcomes will persist into the next flip. If that assumption fails (perhaps the coin has been replaced with a biased one), the forecast is wrong. The reliability of past calibration cannot, by Hume's argument, license the future prediction.

Case 2: a medical extrapolation. A drug has been tested in clinical trials and shown to be effective. The trials were rigorous; the statistical analysis was correct. We now use the drug on new patients. The inference from "effective in trials" to "effective for the next patient" is inductive. Hume's argument applies directly: no contradiction follows from the drug failing on the new patient (different metabolism, different population, different age, different comorbidity). The empirical evidence licenses confidence about the conditions of the trial; it does not license confidence about conditions outside it.

Case 3: a machine-learning model. A model trained on a dataset performs well on a held-out validation set. We deploy the model. The inference from "performs well on held-out data" to "performs well in production" is inductive. Hume's argument applies again: no contradiction follows from production data differing from validation data. This is the technical content of the Empiricism essay: scaling-law evidence is strongest under the training distribution; deployment requires assumptions about the relationship between training distribution and deployment distribution that scaling alone cannot establish.

In each case, Hume's argument is not abstract. It identifies the actual gap between past evidence and future application. Modern philosophy of science, modern statistics, and modern machine learning are all responses to that gap.

Major Modern Responses

The 280 years since Hume have produced several major lines of response. Each reinterprets one of the premises rather than refuting the argument as Hume gave it.

Reichenbach's Pragmatic Vindication (1949)

Hans Reichenbach proposed a pragmatic defense of induction.¹ The argument: even granting Hume that induction cannot be deductively justified, we can show that induction is the best available method for the kind of question induction tries to answer.

The argument structure (simplified): consider the claim "the long-run relative frequency of a property in a sequence converges to a limit." If that claim is true, induction (specifically, taking the past frequency as the estimate of the future frequency) will converge to the true limit. If the claim is false (the limit does not exist), no method whatever can discover the limit, since there is no limit to discover. Therefore induction is as good as any method if a stable limit exists, and no method can do better if it does not.

This is not a justification of induction in Hume's sense. Reichenbach grants Hume that we cannot prove induction succeeds. He argues only that if anything can succeed, induction will. We are licensed to use induction not because it is rational in the deductive sense but because the alternatives are no better and the cost of using a worse method (when induction would have worked) is high.

Where it falls short. Reichenbach's argument requires the existence of long-run relative frequencies, which is itself an inductive assumption. It also gives no guidance about how soon induction will converge, which matters for any practical application. The vindication is real but limited.

Goodman's New Riddle of Induction (1955)

Nelson Goodman raised a different challenge.² Suppose Hume's problem is somehow resolved and we grant the legitimacy of induction in the abstract. We still face the question: which generalizations should we project from past data?

The famous example. Define a predicate grue: an object is grue iff it is observed before time $t$ and is green, or it is observed after time $t$ and is blue. Until time $t$ , every emerald observed has been both green and grue: green by ordinary inspection, grue by definition.

Two inductive generalizations are equally well-supported by the evidence:

All emeralds are green. (After $t$ , the next emerald will be green.)
All emeralds are grue. (After $t$ , the next emerald will be blue.)

The two predictions are incompatible. They cannot both be right. Yet the past data is consistent with both generalizations. Which one should we project?

Goodman's point: the data alone does not determine the answer. The decision between green and grue (and the infinitely many similar manufactured predicates) depends on which predicates are projectible: which generalizations the data licenses. Projectibility is not a logical property; it is a feature of our background theory, language, scientific community, and practical experience.

The new riddle. Hume's old riddle: how can past data license future predictions at all? Goodman's new riddle: even granting that past data licenses some predictions, how do we decide which predictions? The decision requires substantive theoretical commitments that the data alone does not provide.

This is not just a philosophical curiosity. In machine learning, choosing a hypothesis class (linear models, neural networks of a given architecture, kernel SVMs) is exactly choosing which generalizations to project. Different hypothesis classes have different inductive biases; different inductive biases produce different predictions on the same data. The grue paradox is the philosophical foundation of this concern.

Bayesian Conditionalization

The dominant formal response is Bayesian. Treat belief as graded credence, treat evidence as conditioning information, and update by Bayes's rule. Covered in detail at Bayesian Epistemology.

The Bayesian response to Hume's problem (originally Carnap, then de Finetti, then Howson and Urbach 1989):³ induction is justified given a prior probability distribution over hypotheses. The probability of each hypothesis is updated by Bayes's rule on each piece of evidence. Hume's circularity problem dissolves in form: Bayesian conditionalization is mathematically derivable from the axioms of probability and is not an independent inductive principle.

What this requires and what it does not. The Bayesian response works only relative to a prior. The choice of prior is itself a substantive commitment that the formalism does not justify. Different priors produce different posterior credences; the Bayesian framework codifies the relationship between prior, evidence, and posterior, but it does not select the prior.

Where it falls short. The prior problem is the new form of Hume's problem. We have replaced "why does past evidence license future inference?" with "why is this prior more rational than that one?" In some applications, weak priors are reasonable and convergence under broad classes of priors is provable (the de Finetti / Doob theorems). In others, the prior dominates the posterior and the choice is doing real work. Bayesian conditionalization sharpens Hume's problem; it does not dissolve it.

Naturalized Epistemology (Quine 1969)

W. V. O. Quine took a different route.⁴ Stop asking for normative justification of induction; just describe how human cognition does it. Epistemology becomes a chapter of psychology.

Quine grants that Hume's argument is correct on its own terms: induction has no non-circular justification. But this fact, he argues, is no different from the fact that there is no non-circular justification of deduction either (any justification of modus ponens uses modus ponens). The demand for a justification that is "outside" all forms of inference is incoherent. Instead, study induction the way we study any cognitive capacity: empirically, scientifically, as a feature of how minds actually reason.

Where it falls short. The naturalized view changes the subject. Hume asked a normative question (is induction rationally justified?). Quine answers a descriptive question (how do humans engage in induction?). Many epistemologists find this an evasion rather than an answer. The descriptive project is valuable, but it does not silence the normative question.

Statistical Learning Theory and PAC Learning

The most technically sophisticated modern response comes from statistical learning theory: Probably Approximately Correct (PAC) learning bounds (Valiant 1984)⁵ and the related Vapnik-Chervonenkis (VC) theory.

The framework. Suppose we have:

A hypothesis class $H$ of functions from inputs to outputs.
A training set $S$ of $m$ samples drawn i.i.d. from a distribution $D$ .
A learning algorithm that picks $\hat{h} \in H$ minimizing empirical error on $S$ .

A PAC bound has the form: with probability at least $1 - \delta$ over the choice of training set, the empirical error on $S$ and the true error on $D$ satisfy

P_D(\text{error}) \leq P_S(\text{error}) + \epsilon(m, H, \delta),

where $\epsilon$ depends on the sample size $m$ , the complexity of $H$ (often via VC dimension), and the confidence level $\delta$ . As $m$ grows, $\epsilon$ shrinks. The bound is non-trivial: it gives a quantitative guarantee, in probability, on how much the empirical error underestimates the true error.

This is a real partial response to Hume. It says: under the i.i.d. assumption (training data and test data are drawn from the same distribution), induction succeeds with quantifiable probability and quantifiable error rate. The Humean question "why should past success license future success?" has a precise answer in this framework: because the two are statistically related under the i.i.d. assumption, and the relationship is provable.

What it does not solve. The i.i.d. assumption is itself an inductive claim about the relationship between past data and future data. PAC bounds do not justify this assumption; they are conditional on it. When the deployment distribution differs from the training distribution (the standard worry about generalization beyond the training distribution), the PAC bounds break, and Hume's problem reappears in its original form.

PAC learning is the most technically successful partial response to Hume. It does not refute the original argument; it gives precise content to what can be established from empirical data, and equally precise content to what cannot.

What the Problem Forces

The right response to Hume is not "the problem is solved" but "we now know what the problem is exactly":

Demonstrative justification of induction is impossible. Hume's P2 stands.
Probable justification of induction is circular unless we relativize to a framework (a prior, an i.i.d. assumption, a hypothesis class). Hume's P3 stands in absolute form but is dissolved in relativized form.
Within a framework, induction is well-behaved: Bayesian conditionalization, PAC bounds, and reflective equilibrium all give content to "what evidence licenses what conclusion."
Across frameworks, the choice of which framework is the deeper question. Goodman's grue is the canonical reminder that this choice is substantive.

The problem of induction has not been solved. It has been factored: the part within a framework is tractable; the part about choosing the framework remains hard.

Connection to Machine Learning

The Empiricism essay in this series makes the connection explicit. Modern ML is statistical induction at scale. Every concern Hume raised has a precise modern echo:

Hume's concern	Modern ML correlate
Justification of induction	Why empirical risk minimization works
Uniformity of nature	The i.i.d. assumption
Future may not resemble past	Distribution shift between train and test
Choice of generalization	Inductive bias of the hypothesis class
New riddle (grue)	Choice of architecture, regularization, prior
Past success licenses future	PAC bounds
Where past success runs out	Failure under distribution shift

The most actively studied of these concerns in current ML is out-of-distribution generalization: the question of how well models trained under one distribution perform under a different distribution. Recht et al. 2019 reconstructed ImageNet test sets and found significant accuracy drops on the reconstructions, even though improvements on the original benchmark still tended to transfer.⁶ The result is precisely Humean: the past evidence (training-set performance) gave no guarantee about the new evidence (reconstructed-test performance), and the gap is empirically large.

The lesson: Hume's problem is not only a philosophical antique. It is the exact philosophical content of every empirical generalization claim in modern AI, and the modern responses to it are the formal apparatus that practical ML uses, knowingly or not.

Common Confusions

Confusion 1: Hume's argument is psychological. Hume's argument has a psychological conclusion (induction is custom rather than reason) but the argument itself is purely logical: induction has no non-circular non-deductive justification. The logical argument is what is hard to refute.

Confusion 2: probability theory solves it. Probability theory provides a formalism for inductive reasoning. It does not justify the framework as a whole. Bayesian conditionalization is the right way to reason given a prior; the prior itself is not justified by the formalism. Hume's problem reappears as the prior problem.

Confusion 3: the problem only matters in philosophy class. The problem is the technical content of every generalization claim in empirical science, statistics, and ML. The Recht et al. ImageNet result is Hume's argument in concrete form. Distribution-shift research is professional engineering with the problem as its central concern.

Confusion 4: the problem is "solved" because we now use Bayesian methods. Bayesian methods give a precise framework for relating prior, evidence, and posterior. They do not select the prior. Goodman's grue says the prior choice is substantive. Bayesian methods sharpen the problem; they do not solve it.

Three Exercises

Exercise 1. Reconstruct the standard form of Hume's argument as a numbered argument. Then identify which premise (P1, P2, P3) each of the following responses challenges and how:

(a) Reichenbach's pragmatic vindication. (b) Bayesian conditionalization with a uniform prior. (c) Naturalized epistemology. (d) PAC learning theory.

Exercise 2. Construct your own grue-style predicate. Specifically, define a predicate $G$ such that:

(a) Every observation made before time $t_0$ is consistent with both $G$ and an ordinary predicate $P$ . (b) After $t_0$ , $G$ and $P$ make incompatible predictions.

Identify which background-theory commitment leads us to project $P$ rather than $G$ . Be specific about what makes one predicate more "natural" than the other.

Exercise 3. Apply Hume's problem to a current ML system. Pick a system you are familiar with (an LLM, a vision model, a recommendation system, a coding agent). Identify:

(a) The distribution under which the model was evaluated. (b) The deployment distribution. (c) The specific assumptions about the relationship between (a) and (b) that the model's evaluation does not establish. (d) The empirical failure mode that would surface if those assumptions are wrong.

This exercise should produce a Hume-style analysis of one specific deployment, in the style of the Empiricism essay's case analyses.

Sketch of answers

Answer 1. The standard form was given above (P1, P2, P3, P4, C).

(a) Reichenbach does not directly challenge any premise. Instead, he accepts P1-P4 and argues that, even without rational justification, induction is pragmatically the best available method. This is a meta-level response: the conclusion of Hume's argument is granted but is shown not to imply that induction should be abandoned.

(b) Bayesian conditionalization challenges P3 by reframing what "induction" is. Bayesian update is mathematically derivable from probability axioms; it is not an independent inductive principle. Hume's circularity worry only applies if induction is treated as a stand-alone rule. If induction is just Bayesian conditionalization on observed evidence, the rule is justified by the probability axioms (which can be motivated independently, e.g., Dutch-book arguments). The catch: the response holds only relative to a prior, and the prior is unjustified by the formalism.

(c) Naturalized epistemology challenges P1's framing. Quine argues that the demand for a non-circular justification is incoherent (deduction has no non-circular justification either). Stop asking for justification; ask instead how human cognition does induction, empirically. This is a meta-level move that changes the question.

(d) PAC learning challenges P3 by giving content to "probable justification" within a precise framework. Conditional on the i.i.d. assumption and a hypothesis class, PAC bounds give quantitative guarantees that empirical and true error are close. The response works within the framework. The framework's i.i.d. assumption is itself inductive, so Hume's problem reappears in choosing the framework, just as with the Bayesian response.

Answer 2. One example. Define bleen: an object is bleen iff observed before time $t_0$ and blue, or observed after $t_0$ and green.

Now consider the property "the sky is blue." Until $t_0$ , every observation of a clear daytime sky has been blue and bleen. After $t_0$ :

The "blue" projection: the next clear daytime sky observation is blue.
The "bleen" projection: the next clear daytime sky observation is green.

The data is identical; the predictions differ. What makes us prefer "blue"?

The natural answer: blue is a simple predicate of physical optics. Its definition does not mention time. Bleen does mention time. We have strong physical reasons to expect that color predicates of the natural-kind sort do not depend on time.

The general principle: a predicate is more projectible if it is well-entrenched in our scientific theories, has predicates in similar entrenched theories, has clear definitions independent of arbitrary time-points, and has been used in successful past predictions. Blue is well-entrenched; bleen is not.

But here is Goodman's point. None of the projectibility-justifying considerations are deductive. They are themselves inductive claims about the relationship between predicates and successful generalization. We are doing induction about projectibility. The new riddle is therefore not avoided; it is folded back into the same problem at a meta-level.

Answer 3. Consider a vision model trained on a curated dataset of natural images.

(a) Evaluation distribution. Held-out test images drawn from the same dataset as training, with the same labeling protocol, the same image-acquisition conditions, and the same label distribution.

(b) Deployment distribution. Production images uploaded by users. Different cameras, different lighting, different framing, different demographic distribution, different image-quality protocols.

(c) Unstated assumptions. The model's reported test accuracy assumes that production data is i.i.d. with the test distribution. This is rarely true: production users have different cameras, different image-capture habits, different subject matter. The Recht et al. ImageNet reconstruction result shows that even carefully curated recreated test sets can produce 5-10 percent accuracy drops; deployment-data shift is typically larger.

(d) Predicted failure mode. Accuracy on production data is lower than reported test accuracy, with the gap growing as deployment conditions diverge from training conditions. Specific subgroups with under-represented training data may have particularly large accuracy drops. Without continuous monitoring, the gap between reported and deployed accuracy may not be visible to the team or the user.

This is a Humean analysis of one specific deployment. The point is that Hume's problem is not abstract; it has specific, predictable, empirical consequences that affect how the system should be evaluated, deployed, and monitored.

Where the Problem Lives in Practice

Three concrete uses.

Distribution-shift research in machine learning. The DomainBed benchmark (Gulrajani-Lopez-Paz 2020) evaluates how various ML algorithms handle distribution shift. The result, repeatedly: standard empirical risk minimization with strong augmentation often beats more sophisticated domain-generalization techniques. The lesson is Humean: there is no algorithmic shortcut around the inductive gap.

The replication crisis. When findings fail to replicate, the original sample's evidence often turns out not to license the population-level claim. This is an empirical instantiation of Hume's worry. Statistical responses (pre-registration, larger samples, focused replications) are the modern engineering answer to the philosophical concern.

Evidence-based policy. Translating research findings (typical: small, selected samples) into policy (large, heterogeneous populations) is the inductive jump from past evidence to future application at societal scale. The widespread failure of policies justified by promising studies is the practical face of Hume's problem.

Prerequisites and Next Pages

Prerequisites: What Is Epistemology?, What Is Logic?.
Related: Empiricism, Induction, and the Limits of LLM Generalization, the existing essay that applies Hume directly to ML.
Next: Bayesian Epistemology, the dominant formal framework for inductive reasoning.
Forward synthesis: Knowledge, Justification, and LLMs.

References

Primary texts:

Hume, David. A Treatise of Human Nature. 1739. Book I, Part III, Sections 6-15. The original formulation of the problem.
Hume, David. An Enquiry Concerning Human Understanding. 1748. Sections IV and V. The second, more polished, version of the argument.
Goodman, Nelson. Fact, Fiction, and Forecast. Harvard, 1955. The new riddle and the grue paradox.
Reichenbach, Hans. The Theory of Probability. University of California Press, 1949. The pragmatic vindication.

Modern reference:

Howson, Colin, and Peter Urbach. Scientific Reasoning: The Bayesian Approach. Open Court, 1989, 3rd ed. 2006. The standard book-length Bayesian response.
Henderson, Leah. "The Problem of Induction." Stanford Encyclopedia of Philosophy. The clearest single survey of the modern responses.
Mitchell, Tom M. "The Need for Biases in Learning Generalizations." Rutgers Computer Science Technical Report CBM-TR-117, 1980. The locus classicus of inductive bias in machine learning.
Vapnik, V. N. The Nature of Statistical Learning Theory. Springer, 1995. The mathematical statistics response.
Quine, W. V. O. "Epistemology Naturalized." 1969. The naturalized response.
Recht, B., R. Roelofs, L. Schmidt, and V. Shankar. "Do ImageNet Classifiers Generalize to ImageNet?" 2019. Empirical evidence for the live shape of Hume's problem in modern ML.

Stanford Encyclopedia entries (link, do not paraphrase):

"The Problem of Induction."
"Goodman's New Riddle of Induction" (within Confirmation, often).
"Hume on Induction and the Problem of Causation."
"Bayesian Epistemology."
"Statistical Learning Theory and Inductive Bias" (related, on philosophy of statistical inference).

Reichenbach, Hans. The Theory of Probability. University of California Press, 1949. The pragmatic vindication appears in §§87-91. ↩
Goodman, Nelson. Fact, Fiction, and Forecast. Harvard, 1955. The grue paradox is in Chapter III, "The New Riddle of Induction." ↩
Howson, Colin, and Peter Urbach. Scientific Reasoning: The Bayesian Approach. Open Court, 1989. The book-length defense of Bayesian responses to Hume. ↩
Quine, W. V. O. "Epistemology Naturalized." In Ontological Relativity and Other Essays, Columbia, 1969. The most influential naturalized-turn paper. ↩
Valiant, Leslie G. "A Theory of the Learnable." Communications of the ACM 27, no. 11 (1984): 1134-1142. The founding paper of PAC learning. Vapnik, V. N., and A. Y. Chervonenkis. "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities." Theory of Probability and Its Applications 16 (1971): 264-280. The VC theory. ↩
Recht, Benjamin, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. "Do ImageNet Classifiers Generalize to ImageNet?" Proceedings of Machine Learning Research 97 (2019): 5389-5400. ↩