Skip to main content

foundations · Essay 4 · 14 min

Syntax vs Semantics in Formal Systems

In a formal system, syntax is the set of rules for which strings count as legitimate expressions; semantics is the assignment of meaning to those expressions. The page works through propositional logic and a small fragment of arithmetic to show the distinction in action, names the standard correspondence theorems (soundness, completeness, Tarski's truth schema), and notes where the page is the philosophical / formal-systems version, not the linguistic one.

Quick Answer

In a formal system, syntax is the set of rules for which strings of symbols count as legitimate expressions of the language. Semantics is the rule for assigning meaning (typically truth-values, denotations, or structures) to those expressions.

A few illustrations.

  • The string (p ∧ q) is syntactically well-formed in propositional logic. The string p ∧ ∧ q is not.
  • Once we fix an interpretation that assigns truth-values to atomic letters, the well-formed string (p ∧ q) means "the conjunction of pp and qq is true under the assignment iff both pp and qq are."
  • Syntax tells you which derivations are legal proofs. Semantics tells you which sentences are true in which models.

This page is the philosophical and formal-systems version of the syntax-vs-semantics distinction. The linguistic version, sentence structure (syntax) versus sentence meaning (semantics) in natural languages, is a different topic with overlapping vocabulary. For the linguistic treatment, see LinguisticsPath. For the philosophy of language treatment of meaning and reference, that lives separately under philosophy-of-language and is not the same distinction.

Side by Side

SyntaxSemantics
What it studiesStrings, grammars, derivation rulesTruth, models, denotation, satisfaction
Primary objectsSymbols, formulas, proofsStructures, valuations, interpretations
Mode of evaluationMechanical: does this string follow the rules?Truth-conditional: in this structure, does this formula hold?
Standard relationΓφ\Gamma \vdash \varphi (φ\varphi is derivable from Γ\Gamma)Γφ\Gamma \models \varphi (φ\varphi is true in every model of Γ\Gamma)
Failure modeSyntactic ill-formedness, ungrammaticalitySemantic falsehood, lack of model
Example legality"Is ∀x (P(x) → Q(x)) a well-formed formula?""Does ∀x (P(x) → Q(x)) hold in the structure M\mathcal{M}?"

The two sides answer different questions. They are connected by the soundness-completeness bridge below, but they are not the same.

Why the Distinction Exists

Before Tarski, "φ\varphi is true" was treated as primitive, the property truth applied to the sentence directly. This led to the well-known semantic paradoxes (the Liar being the canonical case: "this sentence is false"). Tarski's 1933 The Concept of Truth in Formalized Languages showed that for formalized languages the paradox dissolves once truth is defined relative to a structure (an interpretation specifying what the symbols denote) and stratified across an object language and a metalanguage containing it.

The pivotal insight: truth is not a property of strings considered alone; it is a relation between strings and structures. Syntactic properties live at the level of the strings. Semantic properties live at the level of the relation. Conflating them is what produced the paradox.

This split made formal logic possible as a precise mathematical discipline, and it is the conceptual foundation of model theory, formal semantics, and substantial parts of theoretical computer science.

A Worked Example: Propositional Logic

Take a tiny formal system: propositional logic with atomic letters p,q,rp, q, r and connectives ¬,,,\neg, \land, \lor, \to.

Syntax

The alphabet is the set {p,q,r,¬,,,,(,)}\{p, q, r, \neg, \land, \lor, \to, (, )\}.

The formation rules (the grammar):

  1. Any atomic letter is a well-formed formula (wff).
  2. If φ\varphi is a wff, so is ¬φ\neg \varphi.
  3. If φ\varphi and ψ\psi are wffs, so are (φψ)(\varphi \land \psi), (φψ)(\varphi \lor \psi), (φψ)(\varphi \to \psi).
  4. Nothing else is a wff.

By these rules, ((pq)r)((p \land q) \to r) is a wff. The string pqp \land \to q is not.

The proof rules (one possible system, natural deduction):

  • ∧-introduction: from φ\varphi and ψ\psi, derive (φψ)(\varphi \land \psi).
  • ∧-elimination: from (φψ)(\varphi \land \psi), derive φ\varphi (or ψ\psi).
  • →-elimination (modus ponens): from φ\varphi and (φψ)(\varphi \to \psi), derive ψ\psi.
  • And several more.

A derivation of ψ\psi from a set Γ\Gamma is a finite sequence of wffs ending in ψ\psi, where each wff is either in Γ\Gamma or follows from earlier wffs by a proof rule. We write Γψ\Gamma \vdash \psi when such a derivation exists. Note: this is a purely syntactic notion. No mention of truth.

Semantics

A valuation is a function vv assigning each atomic letter a truth-value in {T,F}\{\text{T}, \text{F}\}.

Given a valuation, we extend vv to all wffs by the standard truth-functional rules:

  • v(¬φ)=Tv(\neg \varphi) = \text{T} iff v(φ)=Fv(\varphi) = \text{F}.
  • v((φψ))=Tv((\varphi \land \psi)) = \text{T} iff v(φ)=v(ψ)=Tv(\varphi) = v(\psi) = \text{T}.
  • v((φψ))=Tv((\varphi \lor \psi)) = \text{T} iff v(φ)=Tv(\varphi) = \text{T} or v(ψ)=Tv(\psi) = \text{T}.
  • v((φψ))=Tv((\varphi \to \psi)) = \text{T} iff v(φ)=Fv(\varphi) = \text{F} or v(ψ)=Tv(\psi) = \text{T}.

A formula φ\varphi is semantically valid (a tautology) iff v(φ)=Tv(\varphi) = \text{T} for every valuation vv. We write φ\models \varphi.

We write Γφ\Gamma \models \varphi to mean: every valuation that makes every member of Γ\Gamma true also makes φ\varphi true. This is semantic consequence.

Note again: this is a purely semantic notion. No mention of proofs.

The Bridge

The two notions agree.

Soundness theorem (propositional logic): Γφ\Gamma \vdash \varphi implies Γφ\Gamma \models \varphi. If you can prove it, it is true in every model of the premises.

Completeness theorem (propositional logic, Post 1921; first-order logic, Gödel 1929): Γφ\Gamma \models \varphi implies Γφ\Gamma \vdash \varphi. If it is true in every model of the premises, you can prove it.

The two together: Γφ    Γφ\Gamma \vdash \varphi \iff \Gamma \models \varphi. The syntactic and semantic notions of consequence coincide.

This is not trivial. Soundness is straightforward (the proof rules were designed to preserve truth). Completeness is hard: it says the proof rules are enough, every semantically valid argument has a syntactic proof. That this holds for first-order logic is one of the central results of modern mathematical logic.

For higher-order logic and certain stronger systems, completeness fails (Lindström's theorems characterize this). The bridge is robust for first-order logic but does not extend automatically.

A Worked Example: Arithmetic and Gödel

The cleanest place to see the syntax-semantics gap matter is Gödel's incompleteness theorems.

Consider Peano arithmetic (PA), the standard formal axiomatization of the natural numbers with addition and multiplication.

  • Syntactic statement: a sentence φ\varphi is provable in PA, written PAφ\text{PA} \vdash \varphi, iff there is a finite derivation of φ\varphi from the PA axioms using the proof rules of first-order logic.
  • Semantic statement: a sentence φ\varphi is true in the standard model of arithmetic iff it holds when interpreted on the natural numbers N\mathbb{N} with the standard addition and multiplication.

For propositional logic and pure first-order logic, completeness gives us \vdash iff \models. For PA, the picture is sharper.

Gödel's first incompleteness theorem (1931). There is a sentence GG in the language of PA such that GG is true in the standard model of arithmetic but not provable in PA. Symbolically, NG\mathbb{N} \models G but PA⊬G\text{PA} \not\vdash G.

The two notions of consequence, syntactic provability in PA, semantic truth on N\mathbb{N}, come apart for any sufficiently strong formal system that includes arithmetic. The Gödel sentence is true; the system cannot prove it.

This is why the syntax-semantics distinction is not a curiosity. It is the framework that lets us say what Gödel's theorem says.

Where the Distinction Bites

Three live applications.

Soundness vs completeness in proof assistants. A proof assistant like Lean or Coq is sound by design: any proof it accepts is a valid derivation in the underlying type theory. Whether it is complete, whether every true statement of mathematics is derivable in its system, depends on the system. Most foundational systems are deliberately incomplete (Gödel) for known reasons.

Tarski-style truth-conditional semantics. When linguists or philosophers use "truth-conditional semantics" they are using Tarski's framework: assign a model, define truth in the model recursively, and identify the meaning of a sentence with its truth-conditions across models. The distinction between formal syntactic structure and semantic interpretation is what makes this enterprise rigorous.

Reasoning under inconsistency. A classical-logic system in which one inconsistency is provable can derive anything (the principle of explosion: φ\bot \vdash \varphi). Paraconsistent logics are designed so that this syntactic explosion does not happen, useful when reasoning over inconsistent legal codes, databases, or historical sources.

Common Confusions

Confusion 1: well-formed = true. A string can be syntactically well-formed and semantically false. "All transformers are reptiles" is grammatical English and a well-formed first-order sentence; it is also obviously false. Well-formedness is necessary for evaluation; it is not sufficient for truth.

Confusion 2: provable = true. Provability is relative to a formal system. A statement provable in classical logic may not be provable in intuitionistic logic; a statement provable in PA + a strong axiom may not be provable in PA alone. Truth (in the semantic sense) is relative to a model. The two are connected by soundness and completeness for systems where the bridge holds; they remain distinct concepts.

Confusion 3: syntactic vs linguistic syntax. The syntax studied in formal logic and the syntax studied in theoretical linguistics share a vocabulary but are different objects. Linguistic syntax studies the structure of natural-language sentences (constituency, dependency, transformations). Formal-system syntax studies the rules for legal strings in a designed language. Both live under "syntax" in the broad sense. This page is on the formal-system version. The linguistic-syntax page lives on LinguisticsPath, and the boundary between the two domains is documented in the path-network ownership registry.

Confusion 4: model = semantics, valuation = syntax. A common slip in introductory texts. The valuation is part of the semantics: it specifies which atoms are true. The syntax is the formula's structure independent of any valuation.

Exercises

For each pair, decide whether the two strings are (a) the same well-formed formula, (b) two different well-formed formulas, or (c) at least one is not a well-formed formula at all.

  1. (pq)(p \to q) versus pqp \to q (the same up to outer parentheses).
  2. ((pq)r)((p \to q) \to r) versus (p(qr))(p \to (q \to r)).
  3. ¬pq\neg p \land q versus p¬q\land p \neg q.

For each formula, decide whether it is a propositional tautology (true under every valuation), a contradiction (false under every valuation), or contingent (true under some, false under others).

  1. (p(qp))(p \to (q \to p)).
  2. (p¬p)(p \land \neg p).
  3. (pq)(qp)(p \to q) \to (q \to p).

For each pair (Γ,φ)(\Gamma, \varphi), decide whether Γφ\Gamma \models \varphi.

  1. Γ={pq,p}\Gamma = \{p \to q,\, p\}, φ=q\varphi = q.
  2. Γ={pq,¬p}\Gamma = \{p \to q,\, \neg p\}, φ=¬q\varphi = \neg q.
  3. Γ={pq,¬p}\Gamma = \{p \lor q,\, \neg p\}, φ=q\varphi = q.

Answers

  1. (a) Same up to convention on outer parentheses. The two strings are typically treated as notational variants; whether the outer parentheses are required is a convention of the specific syntax.
  2. (b) Different. \to associates to the right by convention, so the second is the standard reading of pqrp \to q \to r. The first is its left-bracketed variant. They differ on the valuation p=T,q=F,r=Fp = \text{T}, q = \text{F}, r = \text{F}, try it and see which evaluates to T\text{T} and which to F\text{F}.
  3. (c) The second string is not well-formed. Connectives in standard infix notation appear between operands; p¬q\land p \neg q violates the formation rule.
  4. Tautology. Often called the "weakening" tautology: anything implies that anything implies it. Verify by truth table.
  5. Contradiction. A statement and its negation cannot both be true under any valuation.
  6. Contingent. Take p=T,q=Tp = \text{T}, q = \text{T}: the antecedent pqp \to q is T, the consequent qpq \to p is T, the whole conditional is T. Take p=F,q=Tp = \text{F}, q = \text{T}: the antecedent pqp \to q is T, the consequent qpq \to p is F, the whole is F. Therefore contingent.
  7. Yes (Γφ\Gamma \models \varphi). Modus ponens is semantically valid.
  8. No. This is denying the antecedent. From ¬p\neg p, neither qq nor ¬q\neg q follows. Counterexample valuation: p=F,q=Tp = \text{F}, q = \text{T} makes the premises true and ¬q\neg q false.
  9. Yes. Disjunctive syllogism. From pqp \lor q and ¬p\neg p, qq must hold.

Prerequisites and Next Pages

References

Primary texts:

  • Tarski, Alfred. "The Concept of Truth in Formalized Languages." 1933. Polish original; English in Logic, Semantics, Metamathematics, Oxford, 1956.
  • Tarski, Alfred. "On the Concept of Logical Consequence." 1936. The model-theoretic definition of \models.
  • Gödel, Kurt. "Über die Vollständigkeit des Logikkalküls" (Completeness of first-order logic). 1929 dissertation; "Über formal unentscheidbare Sätze..." (incompleteness for arithmetic). 1931.

Modern reference:

  • Mendelson, Elliott. Introduction to Mathematical Logic. CRC Press, 6th ed. 2015. Standard graduate-level treatment of syntax, semantics, soundness, and completeness for first-order logic.
  • Enderton, Herbert. A Mathematical Introduction to Logic. Academic Press, 2nd ed. 2001. Cleaner notation; equivalent depth.
  • Boolos, George, John Burgess, and Richard Jeffrey. Computability and Logic. Cambridge, 5th ed. 2007. Covers the syntax-semantics-computability triangle including Gödel.

Stanford Encyclopedia entries (link, not paraphrase):