Generative and Discriminative Models

Every decision system answers two questions:
1️. What do I know about the world?
2️. What should I do given what I know?

Step 1: The Classification Process (Two Stages)

When we train a model to classify data, it usually involves two stages:

Inference Stage: Learn from data how likely each class is, given the input.
→ This means learning ( p(C_k | x) ) : the probability that input x belongs to class C_k.
Decision Stage: Use those probabilities to make a final choice (e.g., assign a label).

Step 2: Three Different Ways to Solve This Problem

Bishop describes three approaches, from most to least complex:

(a) Generative Approach

Goal: Model how the data itself is generated for each class.

You learn:

( p(x | C_k) ): How the data looks inside each class.
( p(C_k) ): How common each class is (the prior probability).
Then you use Bayes’ Theorem to compute the posterior: $p(C_k|x) = \frac{p(x|C_k) \, p(C_k)}{p(x)}$
Finally, you classify based on whichever class gives the highest posterior.

Example:
If you’re classifying emails as “Spam” or “Not Spam,” a generative model tries to learn how spam emails are written and how non-spam emails are written, then compares a new email against both patterns.

Pros:

Learns full data distribution: can generate new data (that’s why it’s called “generative”).
Can detect outliers/novel data (e.g., something unlike anything seen before).

Cons:

Requires lots of data, because learning the entire distribution ( p(x|C_k) ) is hard.
Often wasteful if we only care about final classification, not data generation.

(b) Discriminative Approach

Goal: Skip modeling the data, just learn to discriminate between classes.

You directly learn:

p(C_k | x)

without ever modeling ( p(x|C_k) ) or ( p(x) ).

Example: Logistic Regression or Neural Networks, they learn directly how inputs map to class probabilities.

Pros:

Focused purely on classification accuracy.
Needs less data than generative models.
Usually performs better when you have enough labeled examples.

Cons:

Doesn’t model the data distribution itself, so can’t generate or detect anomalies as easily.

(c) Discriminant Function Approach

Goal: Forget probabilities, just learn a direct mapping from input to decision.

You learn a function:

f(x) \rightarrow \text{class label}

Example:

For a simple two-class problem:

f(x) = 0 \;\Rightarrow\; \text{Class } C_1

f(x) = 1 \;\Rightarrow\; \text{Class } C_2

Example:
A simple linear classifier or perceptron that only outputs the class label, not probabilities.

Pros:

Fast and simple.
Works when probabilities aren’t needed.

Cons:

No notion of uncertainty (we don’t know how confident the model is).
Cannot compute ( p(C_k|x) ), so less useful for tasks needing probability estimates.

Step 3: Comparing the Three Approaches

Approach	What It Learns	Example Algorithms	Pros	Cons
Generative	Learns how data is generated for each class, estimates `p(x \| C_k)` and `p(C_k)`, then computes `p(C_k \| x)` using Bayes' theorem	Naive Bayes, Gaussian Mixture Models (GMMs)	Can generate new data; handles missing data; can detect outliers	Requires large datasets; computationally heavy
Discriminative	Learns `p(C_k \| x)` directly without modeling the data distribution	Logistic Regression, Neural Networks	High accuracy for classification; efficient training	Cannot generate data; weaker at detecting anomalies
Discriminant Function	Learns a direct mapping `f(x)` → class label (no probabilities)	Perceptron, Support Vector Machine (hard margin)	Simple, fast, and memory efficient	No probability estimates; no measure of confidence

Step 4: Why It Matters

Generative models understand how the world produces data.
Discriminative models focus on how to make correct decisions.
Discriminant functions do just enough to separate classes, no reasoning behind it.

Generative models explain the world.
Discriminative models win in it.
Discriminant functions just decide.

Use Cases

Generative: Anomaly detection, missing data handling, data synthesis (e.g., Naive Bayes, GMM, Variational Autoencoders).
Discriminative: Most modern ML (e.g., Logistic Regression, Neural Nets).
Discriminant Function: Classical decision surfaces (e.g., SVMs without probability calibration).