Every decision system answers two questions:
1ïž. What do I know about the world?
2ïž. What should I do given what I know?
Step 1: The Classification Process (Two Stages)
When we train a model to classify data, it usually involves two stages:
-
Inference Stage: Learn from data how likely each class is, given the input.
â This means learning ( p(Ck | x) ) : the probability that inputxbelongs to class Ck. -
Decision Stage: Use those probabilities to make a final choice (e.g., assign a label).
Step 2: Three Different Ways to Solve This Problem
Bishop describes three approaches, from most to least complex:
(a) Generative Approach
Goal: Model how the data itself is generated for each class.
You learn:
- ( p(x | Ck) ): How the data looks inside each class.
- ( p(Ck) ): How common each class is (the prior probability).
- Then you use Bayesâ Theorem to compute the posterior:
- Finally, you classify based on whichever class gives the highest posterior.
Example:
If youâre classifying emails as âSpamâ or âNot Spam,â a generative model tries to learn how spam emails are written and how non-spam emails are written, then compares a new email against both patterns.
Pros:
- Learns full data distribution: can generate new data (thatâs why itâs called âgenerativeâ).
- Can detect outliers/novel data (e.g., something unlike anything seen before).
Cons:
- Requires lots of data, because learning the entire distribution ( p(x|Ck) ) is hard.
- Often wasteful if we only care about final classification, not data generation.
(b) Discriminative Approach
Goal: Skip modeling the data, just learn to discriminate between classes.
You directly learn:
without ever modeling ( p(x|Ck) ) or ( p(x) ).
Example: Logistic Regression or Neural Networks, they learn directly how inputs map to class probabilities.
Pros:
- Focused purely on classification accuracy.
- Needs less data than generative models.
- Usually performs better when you have enough labeled examples.
Cons:
- Doesnât model the data distribution itself, so canât generate or detect anomalies as easily.
(c) Discriminant Function Approach
Goal: Forget probabilities, just learn a direct mapping from input to decision.
You learn a function:
Example:
For a simple two-class problem:
Example:
A simple linear classifier or perceptron that only outputs the class label, not probabilities.
Pros:
- Fast and simple.
- Works when probabilities arenât needed.
Cons:
- No notion of uncertainty (we donât know how confident the model is).
- Cannot compute ( p(Ck|x) ), so less useful for tasks needing probability estimates.
Step 3: Comparing the Three Approaches
| Approach | What It Learns | Example Algorithms | Pros | Cons |
|---|---|---|---|---|
| Generative | Learns how data is generated for each class, estimates p(x | Ck) and p(Ck), then computes p(Ck | x) using Bayes' theorem | Naive Bayes, Gaussian Mixture Models (GMMs) | Can generate new data; handles missing data; can detect outliers | Requires large datasets; computationally heavy |
| Discriminative | Learns p(Ck | x) directly without modeling the data distribution | Logistic Regression, Neural Networks | High accuracy for classification; efficient training | Cannot generate data; weaker at detecting anomalies |
| Discriminant Function | Learns a direct mapping f(x) â class label (no probabilities) | Perceptron, Support Vector Machine (hard margin) | Simple, fast, and memory efficient | No probability estimates; no measure of confidence |
Step 4: Why It Matters
- Generative models understand how the world produces data.
- Discriminative models focus on how to make correct decisions.
- Discriminant functions do just enough to separate classes, no reasoning behind it.
Generative models explain the world.
Discriminative models win in it.
Discriminant functions just decide.
Use Cases
- Generative: Anomaly detection, missing data handling, data synthesis (e.g., Naive Bayes, GMM, Variational Autoencoders).
- Discriminative: Most modern ML (e.g., Logistic Regression, Neural Nets).
- Discriminant Function: Classical decision surfaces (e.g., SVMs without probability calibration).