Generative and Discriminative Models

How Generative and Discriminative Models work.

Mon, Nov 24th
machinelearningpatternrecognitiongenerativemodeldiscriminativemodel
Created: 2025-12-15Updated: 2025-12-15

Every decision system answers two questions:
1. What do I know about the world?
2. What should I do given what I know?


Step 1: The Classification Process (Two Stages)

When we train a model to classify data, it usually involves two stages:

  1. Inference Stage: Learn from data how likely each class is, given the input.
    → This means learning ( p(Ck | x) ) : the probability that input x belongs to class Ck.

  2. Decision Stage: Use those probabilities to make a final choice (e.g., assign a label).


Step 2: Three Different Ways to Solve This Problem

Bishop describes three approaches, from most to least complex:

(a) Generative Approach

Goal: Model how the data itself is generated for each class.

You learn:

  • ( p(x | Ck) ): How the data looks inside each class.
  • ( p(Ck) ): How common each class is (the prior probability).
  • Then you use Bayes’ Theorem to compute the posterior: p(Ck∣x)=p(x∣Ck) p(Ck)p(x)p(C_k|x) = \frac{p(x|C_k) \, p(C_k)}{p(x)}
  • Finally, you classify based on whichever class gives the highest posterior.

Example:
If you’re classifying emails as “Spam” or “Not Spam,” a generative model tries to learn how spam emails are written and how non-spam emails are written, then compares a new email against both patterns.

Pros:

  • Learns full data distribution: can generate new data (that’s why it’s called “generative”).
  • Can detect outliers/novel data (e.g., something unlike anything seen before).

Cons:

  • Requires lots of data, because learning the entire distribution ( p(x|Ck) ) is hard.
  • Often wasteful if we only care about final classification, not data generation.

(b) Discriminative Approach

Goal: Skip modeling the data, just learn to discriminate between classes.

You directly learn:

p(Ck∣x)p(C_k | x)

without ever modeling ( p(x|Ck) ) or ( p(x) ).

Example: Logistic Regression or Neural Networks, they learn directly how inputs map to class probabilities.

Pros:

  • Focused purely on classification accuracy.
  • Needs less data than generative models.
  • Usually performs better when you have enough labeled examples.

Cons:

  • Doesn’t model the data distribution itself, so can’t generate or detect anomalies as easily.

(c) Discriminant Function Approach

Goal: Forget probabilities, just learn a direct mapping from input to decision.

You learn a function:

f(x)→class labelf(x) \rightarrow \text{class label}

Example:

For a simple two-class problem:

f(x)=0  ⇒  Class C1f(x) = 0 \;\Rightarrow\; \text{Class } C_1 f(x)=1  ⇒  Class C2f(x) = 1 \;\Rightarrow\; \text{Class } C_2

Example:
A simple linear classifier or perceptron that only outputs the class label, not probabilities.

Pros:

  • Fast and simple.
  • Works when probabilities aren’t needed.

Cons:

  • No notion of uncertainty (we don’t know how confident the model is).
  • Cannot compute ( p(Ck|x) ), so less useful for tasks needing probability estimates.

Step 3: Comparing the Three Approaches

ApproachWhat It LearnsExample AlgorithmsProsCons
GenerativeLearns how data is generated for each class, estimates p(x | Ck) and p(Ck), then computes p(Ck | x) using Bayes' theoremNaive Bayes, Gaussian Mixture Models (GMMs)Can generate new data; handles missing data; can detect outliersRequires large datasets; computationally heavy
DiscriminativeLearns p(Ck | x) directly without modeling the data distributionLogistic Regression, Neural NetworksHigh accuracy for classification; efficient trainingCannot generate data; weaker at detecting anomalies
Discriminant FunctionLearns a direct mapping f(x) → class label (no probabilities)Perceptron, Support Vector Machine (hard margin)Simple, fast, and memory efficientNo probability estimates; no measure of confidence

Step 4: Why It Matters

  • Generative models understand how the world produces data.
  • Discriminative models focus on how to make correct decisions.
  • Discriminant functions do just enough to separate classes, no reasoning behind it.

Generative models explain the world.
Discriminative models win in it.
Discriminant functions just decide.


Use Cases

  • Generative: Anomaly detection, missing data handling, data synthesis (e.g., Naive Bayes, GMM, Variational Autoencoders).
  • Discriminative: Most modern ML (e.g., Logistic Regression, Neural Nets).
  • Discriminant Function: Classical decision surfaces (e.g., SVMs without probability calibration).