Russell's Paradox: Unrestricted Comprehension and Set-Theoretic Responses

The Puzzle

Consider the property not being a member of oneself. Some sets have this property: $\{1, 2, 3\}$ is a set whose members are numbers, not the set itself, so it is not a member of itself. Some sets seem to lack this property: the set of all sets (if there is such a thing) would be a set, and a member of itself.

Now form the set of all sets that have the property:

R = \{x : x \notin x\}.

Is $R \in R$ ?

If $R \in R$ , then $R$ has the property defining $R$ , so $R \notin R$ . Contradiction.

If $R \notin R$ , then $R$ has the property defining $R$ , so $R \in R$ . Contradiction.

Either way we land in contradiction. $R$ cannot consistently be a member of itself or fail to be one. Yet $R$ was constructed by what looked like a perfectly reasonable rule: form the set of all things satisfying a given property.

This is Russell's Paradox. Bertrand Russell discovered it in 1901 while working on the foundations of mathematics, and reported it to Gottlob Frege in a letter dated June 16, 1902. Frege had just sent the second volume of his Grundgesetze der Arithmetik to the printer. The paradox showed that Frege's central system was inconsistent.¹

This page assumes What Is Logic? and Predicate Logic. The latter especially: Russell's paradox is most cleanly stated in the language of first-order logic with set-membership as a binary predicate.

Reconstructed Formally

What makes Russell's paradox forced rather than a quirk of natural-language ambiguity is that it follows from a single axiom that looks compulsory.

Frege's Basic Law V (Grundgesetze, 1893): For any propositional function $\varphi(x)$ , there exists a set $\{x : \varphi(x)\}$ whose members are exactly the things satisfying $\varphi$ . Formally:

\forall \varphi\, \exists y\, \forall x\, (x \in y \leftrightarrow \varphi(x)).

This is the principle of unrestricted comprehension: for any property whatsoever, there is a set containing exactly the things with that property. It is the most natural-looking axiom of set formation and was a cornerstone of Frege's logicism.

The derivation. Apply Basic Law V with $\varphi(x) := x \notin x$ . The axiom guarantees a set $R$ such that, for all $x$ :

x \in R \leftrightarrow x \notin x.

In particular, taking $x = R$ :

R \in R \leftrightarrow R \notin R.

This is a sentence equivalent to its own negation. By classical propositional logic, this is a contradiction: $R \in R$ and $\neg (R \in R)$ both hold. By the principle of explosion, every sentence is provable in the system. The system is inconsistent.

The derivation uses three ingredients:

Unrestricted comprehension (Basic Law V).
Self-membership is a meaningful property (so $x \notin x$ is a legitimate $\varphi$ ).
Classical logic.

Removing any of the three blocks the paradox. The four major solutions correspond to four different choices of which to restrict.

What the Paradox Forced

The discovery of Russell's paradox in 1901-1902 was a foundational crisis. Working mathematicians at the turn of the twentieth century had been treating set theory as a stable foundation. Russell's contradiction showed that the most natural-looking axiom of set formation, the one Frege had taken as foundational, generated outright contradiction.

Three philosophical lessons emerged immediately:

The line between logic and set theory was not as clean as Frege had hoped. Logicism, the view that mathematics reduces to pure logic, was in trouble.
Self-reference and too much set-formation freedom were dangerous. The paradox arises in the same family as the Liar (covered in The Liar Paradox) and Burali-Forti (1897, the paradox of the largest ordinal).
The choice of foundational axioms is not arbitrary. Different restrictions on comprehension give different mathematics; the choice has lasting consequences.

The technical responses unfolded over the following thirty years.

Solution Family 1: Russell's Type Theory

Bertrand Russell, having discovered the paradox, also produced the first major response. In Principia Mathematica (1910-1913, with Alfred North Whitehead), he developed ramified type theory.²

The construction. Every object has a type. There are individuals (type 0), sets of individuals (type 1), sets of sets of individuals (type 2), and so on. The membership relation $x \in y$ is well-formed only when the type of $y$ is one higher than the type of $x$ .

How this dissolves the paradox. The expression $x \in x$ is not well-formed: it would require $x$ to have a type both equal to and higher than its own type. Since $x \in x$ cannot be written, the property $x \notin x$ cannot be defined, and the Russell set $R$ cannot be constructed. The paradox is blocked at the level of syntax, before it can be derived.

Cost of the solution. Type theory is more cumbersome than set theory: variables must be type-annotated, and many constructions that look natural in set theory become awkward. Ramified type theory required additional axioms of reducibility to recover ordinary mathematics, and these axioms looked ad hoc. The simple type theory of Ramsey-Church is cleaner but loses some of Russell's anti-vicious-circle intuition.

Despite the costs, type theory has had a remarkable second life. Modern proof assistants (Coq, Lean, Agda) all use descendants of Russell-Church type theory. Type theory is now the de facto foundation of computer-checked mathematics, where the cost of manual type-annotation is paid by the machine. Russell's response to his own paradox is, a century later, the working substrate of formal verification.

Solution Family 2: Zermelo-Fraenkel Set Theory

Ernst Zermelo (1908) proposed a different fix.³ Keep set theory as a single-sorted theory with one membership relation, but restrict comprehension. Replace Frege's unrestricted Basic Law V with axioms that say: for any existing set $A$ and any property $\varphi$ , the set $\{x \in A : \varphi(x)\}$ exists.

The key axiom is Separation (also called the axiom of Aussonderung or restricted comprehension):

\forall A\, \forall \varphi\, \exists B\, \forall x\, (x \in B \leftrightarrow x \in A \land \varphi(x)).

The new set $B$ is always a subset of an already-existing set $A$ . We cannot form $\{x : x \notin x\}$ unrestrictedly; we can only form $\{x \in A : x \notin x\}$ for some specific $A$ already known to exist.

How this dissolves the paradox. Suppose we tried to form $R = \{x : x \notin x\}$ using Separation. We would need an existing set $A$ to separate from. If there is no "set of all sets" (and ZF is designed to deny that there is), the Russell construction never gets started. The Russell-style argument, applied within ZF, instead produces an interesting theorem: there is no set of all sets. (Proof: suppose $V$ is the set of all sets. Apply Separation with $\varphi(x) := x \notin x$ to get $R = \{x \in V : x \notin x\}$ . Then $R \in R \leftrightarrow R \in V \land R \notin R$ . Since $R$ is a set, $R \in V$ , so $R \in R \leftrightarrow R \notin R$ , contradiction. Therefore no set $V$ contains all sets.)

ZF is supplemented with several other axioms: Pairing (sets of two elements exist), Union, Power Set, Infinity, Replacement (Fraenkel-Skolem), Foundation (no set is a member of itself, transitively), and (in ZFC) the Axiom of Choice. Together these axioms define a rich enough universe to contain all of standard mathematics.

Cost of the solution. ZF is the working consensus among mathematicians. The cost: we lose the "set of all sets," and a few intuitive ideas (universal complementation, the universe as an object) are no longer available. These costs are widely judged acceptable. ZFC is the de facto foundation of mainstream mathematics.

Solution Family 3: Von Neumann-Bernays-Goedel Class Theory

John von Neumann (1925), Paul Bernays (1937 onward), and Kurt Goedel (1940) developed a different response.⁴ Instead of restricting comprehension, distinguish between two kinds of collections: sets (which can be members of other collections) and proper classes (which cannot).

In NBG, every collection of objects ("class") is well-defined, but only some classes are sets. A class that is not a set is a proper class. Sets can belong to classes; proper classes cannot belong to anything. The distinction is captured by an axiom: a class is a set iff it is a member of some class.

How this dissolves the paradox. The Russell collection $R = \{x : x \notin x\}$ is well-defined as a class. Asking whether $R \in R$ requires $R$ to be a set (only sets can be members). The argument shows that $R$ cannot be a set: if it were, $R \in R \leftrightarrow R \notin R$ would follow as in Frege. Therefore $R$ is a proper class, and the question " $R \in R$ ?" is answered: no, because $R$ is not in anything.

The construction is technically attractive: it preserves the universal-collection intuition (the class of all sets exists; it just is not itself a set) while blocking the contradiction. ZF and NBG are equivalent in expressive power for talking about sets: any theorem about sets provable in ZF is provable in NBG, and any theorem of NBG that does not mention proper classes is provable in ZF.

Cost of the solution. Two-sorted logic is heavier than one-sorted ZF. NBG is more expressive but more cumbersome. For most mathematical practice, ZFC's flexibility is preferred; NBG is used where a smooth treatment of "the class of all groups," "the class of all topological spaces," or other large collections is needed.

Solution Family 4: Quine's New Foundations

Willard Van Orman Quine (1937) proposed a more radical move.⁵ Allow unrestricted comprehension for stratified formulas only. A formula is stratified if it is possible to assign each variable a natural-number "level" such that whenever $x \in y$ appears, the level of $y$ is one higher than the level of $x$ .

The comprehension principle of NF:

\exists y\, \forall x\, (x \in y \leftrightarrow \varphi(x)) \quad \text{whenever } \varphi \text{ is stratified.}

How this dissolves the paradox. The formula $x \notin x$ is not stratified: $x$ would have to be assigned a single level, but $x \in x$ requires the level of $x$ to be one higher than the level of $x$ , which is impossible. So the Russell set is not constructible by NF's comprehension axiom. The paradox is blocked at the level of formula well-formedness.

NF is one-sorted (no separate types or classes) and admits a universal set $V = \{x : x = x\}$ (the formula $x = x$ is trivially stratified). This makes NF technically distinctive among the major foundations.

Cost of the solution. NF has been studied since 1937, but its consistency is not known to be provable from the consistency of ZFC. A variant called NFU (NF with Urelements), introduced by Ronald Jensen in 1969, is provably consistent if Peano Arithmetic is. NF and NFU have small but devoted communities; mainstream mathematics still uses ZFC. The interest of NF is partly philosophical (what is the natural way to restrict comprehension?) and partly technical (alternative foundations matter for understanding what is conventional in our chosen one).

Where the Solutions Stand Today

Modern mathematical foundations are pluralistic, but the working consensus is sharp.

Foundation	Used by	Trade-off
ZFC	Almost all working mathematicians	Standard; flexible; some intuitive constructions (universal set, set of all sets) lost
NBG	Working with large categories or the class of all sets	Two-sorted; conservative over ZFC for set-only theorems
Type theory (Russell-Church-Martin-Loef)	Computer-checked formal mathematics; Coq, Lean, Agda	Type-annotated; closer to programming languages; the foundation of automated proof
NF / NFU	Specialist research	One-sorted; universal set; consistency status varies

For day-to-day mathematics, ZFC is the default. For computer-verified mathematics, type theory has become the practical choice. Russell's paradox is therefore not a settled historical curiosity but a forking point whose three remaining branches all carry weight.

Connection to the Liar Paradox

Russell's paradox and the Liar paradox are structurally related. Both exploit a kind of self-reference. Both arise from an unrestricted principle (Frege's Basic Law V; the T-schema applied universally) combined with classical logic. Both forced a choice between restricting the unrestricted principle, restricting self-reference, or restricting classical logic.

The structural similarity is not accidental. Self-reference has a precise common form. Russell himself recognized this; his early work on the paradoxes treated them as instances of a more general "vicious-circle" phenomenon. Goedel's incompleteness theorems (treated in Syntax vs Semantics) use the same self-reference machinery to produce sentences that are true but unprovable rather than paradoxical.

The general pattern. Diagonalization is the technique of constructing, for any total function $f$ on objects of some kind, an object that differs from $f$ at every input. Russell's paradox diagonalizes against the membership function. The Liar diagonalizes against the truth function. Goedel diagonalizes against the provability function. Cantor's theorem (the powerset of any set is strictly larger than the set) diagonalizes against any proposed bijection. The technique is the same; the specific result varies with the function being diagonalized against.

This is one reason Russell's paradox is foundational rather than merely curious. It is the diagonal argument applied to membership, and the diagonal argument is the core technique of twentieth-century mathematical logic.

Common Confusions

Confusion 1: Russell's paradox shows mathematics is inconsistent. It shows unrestricted comprehension is inconsistent. The standard responses (ZFC, NBG, type theory, NF) all preserve a consistent mathematics by restricting which collections count as sets. Standard mathematics is not believed to be inconsistent; the paradox forced a careful axiomatization, which mainstream mathematics has had since 1908.

Confusion 2: the paradox is just self-reference. Self-reference alone does not produce contradictions. The sentence "this sentence has five words" is self-referential but consistent (it is false, in this case; "this sentence has six words" would be true). The paradox arises from self-reference plus a complete construction principle (unrestricted comprehension for Russell, T-schema for Liar).

Confusion 3: ZFC is "the" right foundation. ZFC is the working consensus, but it is not philosophically forced. Type theory is just as legitimate, more natural for computational purposes, and has a strong claim to be foundational from a constructive or proof-theoretic standpoint. Mathematicians often speak as though ZFC were the only foundation; this is convention, not necessity.

Confusion 4: foundations do not affect ordinary mathematics. True for most theorems but false at the boundaries. Choice of foundation affects: which collections count as "sets" (large categories matter for working algebraists and topologists), what the axiom of choice looks like (constructively or classically), how proof verification handles self-reference (different proof assistants make different choices), and the exact statement of metatheorems. Foundations matter even when one chooses to ignore them.

Two Exercises

Exercise 1. Apply the Russell-style construction inside ZFC to derive the theorem "there is no set of all sets." Specifically, suppose for contradiction that $V$ is a set whose members are all sets. Use the Separation axiom with $\varphi(x) := x \notin x$ to construct $R = \{x \in V : x \notin x\}$ . Derive a contradiction. Conclude that $V$ cannot be a set.

Exercise 2. Show that the formula $x \in x$ is not stratifiable in Quine's NF. (Hint: try assigning levels to the variables. There is only one variable, $x$ . The membership $x \in x$ requires the level of the right-side $x$ to be one higher than the left-side $x$ . Both occurrences are the same variable.) Then identify which of the following formulas are stratifiable, and explain why.

(a) $x \in y$ . (b) $\exists y\, x \in y \land y \in z$ . (c) $\exists y\, x \in y \land y \in x$ . (d) $\forall y\, (y \in x \to y = z)$ .

Sketch of answers

Answer 1. Suppose $V$ is a set whose members are all sets. By the Separation axiom of ZFC, applied to $V$ with the property $\varphi(x) := x \notin x$ , the set $R = \{x \in V : x \notin x\}$ exists.

We have, for any $x$ :

x \in R \iff x \in V \land x \notin x.

Take $x = R$ :

R \in R \iff R \in V \land R \notin R.

Since $R$ is a set, $R \in V$ (because $V$ contains every set). The biconditional simplifies to:

R \in R \iff R \notin R.

This is a contradiction. Therefore the assumption that $V$ is a set fails: there is no set of all sets. (In NBG, V is a proper class; in ZFC, V simply does not exist as a set.)

Answer 2. The formula $x \in x$ requires the level of $x$ on the right to be one higher than the level of $x$ on the left. But both occurrences are the same variable, so they must have the same level. Contradiction. Not stratifiable.

(a) $x \in y$ is stratifiable: assign $x$ level $0$ , $y$ level $1$ .

(b) $\exists y\, x \in y \land y \in z$ is stratifiable: assign $x$ level $0$ , $y$ level $1$ , $z$ level $2$ .

(c) $\exists y\, x \in y \land y \in x$ is not stratifiable: the first conjunct requires $y$ to have level one higher than $x$ ; the second requires $x$ to have level one higher than $y$ . The two constraints contradict each other.

(d) $\forall y\, (y \in x \to y = z)$ is stratifiable: $y$ has level $0$ , $x$ has level $1$ , $z$ has level $0$ (since $y = z$ requires $y$ and $z$ to have the same level).

The general rule: a formula is stratifiable iff there is a consistent assignment of natural-number levels to its variables that satisfies the membership constraint at every $\in$ atom and the equal-level constraint at every $=$ atom.

Where Russell's Paradox Lives in Practice

Three concrete uses.

Foundations of computer-checked mathematics. Coq, Lean, and Agda all implement variants of dependent type theory descended from Russell-Church and Martin-Loef. The hierarchy of type universes in these systems (Type 0 : Type 1, Type 1 : Type 2, ...) is a direct response to Russell's paradox in computational dress. Without the universe hierarchy, the systems would be inconsistent (this is Girard's paradox: assuming Type : Type derives a contradiction analogous to Russell's). Anyone writing code in a modern proof assistant is, structurally, paying the cost of avoiding Russell.

Database theory and finitary collections. Even in finite-mathematics applications, the question of which "sets of sets" are well-defined arises. Database design imposes implicit type constraints to avoid Russell-like cycles in self-referential schemas.

Foundations of category theory. Categories with objects "all sets" or "all groups" are large categories, not sets in the usual ZFC sense. The handling of large categories requires either NBG, Grothendieck universes added to ZFC, or category-theoretic foundations (such as the Elementary Theory of the Category of Sets). The choice of how to handle large categories is a direct descendant of Russell's paradox.

The general lesson: Russell's paradox is not historical. It is a structural constraint on any sufficiently rich formal system, and the response to it is one of the few things that mainstream mathematicians, type theorists, and category theorists all care about.

Prerequisites and Next Pages

Prerequisites: What Is Logic?, Predicate Logic.
Related: The Liar Paradox, the structural cousin in the theory of truth rather than set membership.
Forward link: What Is a Symbolic System? where the connection to type universes in modern proof assistants is mentioned.

References

Primary texts:

Frege, Gottlob. Grundgesetze der Arithmetik. Two volumes, 1893 and 1903. The system Russell's paradox refuted. English translation: Basic Laws of Arithmetic, P. Ebert and M. Rossberg, eds., Oxford, 2013.
Russell, Bertrand. Letter to Frege, June 16, 1902. Reprinted in van Heijenoort, From Frege to Goedel, Harvard, 1967, pp. 124-125.
Whitehead, Alfred North, and Bertrand Russell. Principia Mathematica, 3 vols., Cambridge, 1910-1913. Russell's response: ramified type theory.
Zermelo, Ernst. "Untersuchungen über die Grundlagen der Mengenlehre I." Mathematische Annalen 65 (1908): 261-281. The original ZF axiomatization.
von Neumann, John. "Eine Axiomatisierung der Mengenlehre." Journal für die reine und angewandte Mathematik 154 (1925): 219-240. NBG class theory.
Quine, W. V. O. "New Foundations for Mathematical Logic." American Mathematical Monthly 44 (1937): 70-80. NF.

Modern reference:

Enderton, Herbert B. Elements of Set Theory. Academic Press, 1977. Standard ZFC textbook.
Jech, Thomas. Set Theory: The Third Millennium Edition. Springer, 2003. Advanced reference.
Hatcher, William S. The Logical Foundations of Mathematics. Pergamon, 1982. Compares all four major foundations side by side.
Forster, T. E. Set Theory with a Universal Set. Oxford, 2nd ed., 1995. Definitive treatment of NF and NFU.
Awodey, Steve, and Andrej Bauer. Type Theory. Available online; the standard modern introduction to dependent type theory as a foundation.

Stanford Encyclopedia entries (link, do not paraphrase):

Russell's letter and Frege's reply are reprinted in Jean van Heijenoort, ed., From Frege to Goedel: A Source Book in Mathematical Logic, 1879-1931, Harvard, 1967, pp. 124-128. Frege's reply opens: "Your discovery of the contradiction caused me the greatest surprise and, I would almost say, consternation, since it has shaken the basis on which I intended to build arithmetic." ↩
Whitehead, Alfred North, and Bertrand Russell. Principia Mathematica, 3 vols., Cambridge, 1910-1913. Volume 1, "Introduction" and "Theory of Types." A simplified version, simple type theory, was developed by F. P. Ramsey and Alonzo Church. ↩
Zermelo, Ernst. "Untersuchungen über die Grundlagen der Mengenlehre I." Mathematische Annalen 65 (1908): 261-281. "Investigations in the Foundations of Set Theory I." The original Zermelo axiomatization, later extended by Abraham Fraenkel (1922) and Thoralf Skolem (1922) into what is now ZF, with the Axiom of Choice giving ZFC. ↩
von Neumann, John. "Eine Axiomatisierung der Mengenlehre." Journal für die reine und angewandte Mathematik 154 (1925): 219-240. Bernays in a series of papers 1937-1954. Goedel's monograph The Consistency of the Axiom of Choice and of the Generalized Continuum Hypothesis, Princeton, 1940, uses the NBG framework. ↩
Quine, W. V. O. "New Foundations for Mathematical Logic." American Mathematical Monthly 44 (1937): 70-80. The system is now called NF. ↩