Markov's Revenge: Chains of Consequence

What if the most powerful technologies in human history: atomic bombs, search engines, artificial intelligence were all accidents? What if they emerged not from grand visions, but from a Russian mathematician's fury at a religious rival? How Markov chains, invented in 1906 to refute a proof of God's existence, became the computational foundation of modernity. It's a story about intellectual warfare, unintended consequences, and the terrifying realization that we live inside abstractions we never chose and cannot escape.

Tue, Dec 2nd
probabilityartificialintelligencepatternrecognition
Created: 2025-12-15Updated: 2025-12-15

The Historical Context

1905, Russia, Markov and Nekrasov

The year is 1905. Russia is tearing itself apart. Socialists rise against the Tsar. The empire fractures. And in the middle of this apocalyptic political convulsion, two mathematicians: Andrey Markov and Pavel Nekrasov are locked in an intellectual war that has nothing to do with artillery or blood, and everything to do with the nature of independence, free will, and the structure of reality itself.

Pavel Nekrasov, the "Tsar of Probability," was a man who understood something profound about power:

Legitimacy requires metaphysical justification.

It is not enough for the Tsar to rule by force. He must rule by necessity. The throne must appear to be the natural conclusion of cosmic order, God's will made manifest in political hierarchy. And so Nekrasov attempted to weaponize probability theory to prove the existence of free will.

His logic was elegant, almost beautiful in its simplicity: "If we observe social statistics (marriages, births, crimes) and these statistics converge to stable averages (the law of large numbers), then the underlying decisions must be independent. And if they are independent, they cannot be determined by material causes. Therefore, they must be acts of free will. Therefore, the soul exists. Therefore, God exists. Therefore, the Tsar's authority is divinely ordained."

This was a theological missile disguised as statistics.

Nekrasov had taken Bernoulli's 200-year-old law of large numbers and transformed it into a proof that the social order was not just political, but ontologically necessary. If you accept his logic, revolution becomes not just unwise, but metaphysically impossible. You cannot overthrow the Tsar because the Tsar is the mathematical conclusion of God's design.

Now enter Andrey Markov: "Andrey the Furious." Markov was was an Atheist, who viewed intellectual dishonesty as a form of violence. He saw Nekrasov's work and recognized it immediately for what it was: the prostitution of mathematics in service of tyranny.

Markov operated under a psychological axiom that most people do not possess: the sanctity of rigor. For Markov, mathematics was not a tool to be bent toward ideology. It was a sovereign territory, a realm where truth existed independent of human desire, political convenience, or divine mandate. When Nekrasov used probability to justify the Tsar, he was not just wrong. He was committing a form of epistemic blasphemy.

Markov understood that Nekrasov's argument, if left unchallenged, would make resistance impossible.

If the people believe that social statistics prove free will, and free will proves God, and God ordains the Tsar, then every revolutionary becomes not just a criminal, but a heretic, a denier of mathematical reality itself. The revolution dies not in the gulags, but in the mind, strangled by the belief that resistance is irrational.

So Markov did the only thing that could shatter this: he proved that convergence does not require independence.

Markov is sitting at his desk. He has decided to prove that dependent systems can still follow the law of large numbers. But he needs data, something where dependence is obvious, undeniable. He cannot use abstract examples. He needs something concrete, something his opponents cannot dismiss.

And so he turns to "Eugene Onegin" by Alexander Pushkin: the sacred text of Russian literature. A poem that every educated Russian knows by heart. A cultural monument.

He strips away the punctuation, the spaces, the meaning. He reduces 20,000 letters into a raw string of vowels and consonants. And then he counts.

He finds that vowel-vowel pairs occur only 6% of the time, far less than the 18% predicted by independence. The letters are dependent. Each letter constrains the next. Russian language itself is a chain of dependencies.

And then he builds his machine. Two states: vowel, consonant. Four transitions. He runs the simulation. The ratios converge. The law of large numbers holds, even though the system is dependent.

With this, Markov has annihilated Nekrasov's argument.

If dependent systems can still produce convergent statistics, then observing convergence in social data tells you nothing about independence. Which means it tells you nothing about free will. Which means Nekrasov's entire edifice: God, the soul, the Tsar's divine mandate, collapses into dust.

And Markov ends his paper with a single sentence, a dagger wrapped in academic language: "Thus, free will is not necessary to do probability." This was a declaration of war.

Markov had no idea what he had created. Markov himself seemingly didn't care much about how it might be applied to practical events. He wrote, 'I'm concerned only with questions of pure analysis. I refer to the question of the applicability with indifference.'

Markov was not trying to invent a tool. He was trying to win an argument. He was trying to destroy Nekrasov's credibility. He was motivated purely by intellectual rage and socialist ideology.

And yet, in doing so, he accidentally created one of the most powerful computational frameworks in the history of human civilization.

1946, The Manhattan Project: Ulam and the Monte Carlo Method

Stanislaw Ulam was lying in bed, recovering from encephalitis. He is playing Solitaire to pass the time. And he begins to wonder: What are the odds that a randomly shuffled game of Solitaire can be won?

The problem is analytically unsolvable. There are 8 × 1067 possible arrangements. No equation can capture this.

But then Ulam has the insight: What if I just play hundreds of games and count how many I win?

This is not a Markov chain yet. Solitaire games are independent. But when Ulam returns to Los Alamos, he realizes something profound: Neutrons are not independent. They form chains of dependencies.

A neutron's behavior depends on where it is, what it has done, what it has struck. You cannot model nuclear fission by sampling random outcomes. You need to model a chain of events where each step influences the next.

And John von Neumann, one of the greatest minds of the 20th century immediately recognizes what they need: A Markov chain.

They build one. They run it on the ENIAC, the first electronic computer. And suddenly, they can simulate nuclear reactions without solving impossible differential equations. They can approximate the behavior of trillions of neutrons by running probabilistic chains.

The atomic bomb becomes computationally feasible because of Markov's 1906 paper, written to destroy Nekrasov's argument about free will.

A mathematician in Tsarist Russia, driven by hatred of religious mysticism, creates a technique to prove that social statistics don't imply free will. Forty years later, that technique becomes the computational engine that allows the United States to calculate how much uranium is needed to vaporize a city. Markov was fighting a battle over Russian metaphysics. He ended up creating the mathematical foundation for nuclear weapons.

1998, Google: The Triumph of Dependent Systems

The internet was exploding. Yahoo was the dominant search engine. But Yahoo has a fatal flaw: it ranks pages by keyword frequency. Quality is invisible to the algorithm. You can game the system by just repeating words in white text on a white background.

Sergey Brin and Larry Page were Stanford PhD students. They were trying to solve a simple problem: How do you measure the quality of a webpage?

And they realize: Links are endorsements.

If Page A links to Page B, that is a vote. But not all votes are equal. If Page A is itself highly linked-to, its vote matters more. If Page A links to 100 other pages, each individual link matters less.

This is a recursive dependency problem. The value of a page depends on the value of the pages linking to it, which depends on the value of the pages linking to them, ad infinitum.

It could not have been solved with traditional ranking. A system was needed that can handle chains of dependencies. And so they built a Markov chain.

Each webpage was a state. Each link was a transition. They simulated a "random surfer" clicking through the web. It could be tracked how much time they spend on each page. Over time, the ratios converged. The pages with the highest convergent probability were the most important. They called it PageRank.

And with it, they dethroned Yahoo. Within four years, Google became the most-used search engine on the planet. Within two decades, Alphabet was worth $2 trillion.

The information empire of the 21st century is built on Markov's 1906 technique, created to win a feud about free will in Tsarist Russia.

What Is a Markov Chain?

A Markov chain is a mathematical system that transitions from one state to another according to probabilistic rules, where: The probability of moving to the next state depends ONLY on the current state, not on the sequence of states that preceded it. This is called the Markov Property (or memorylessness).

Markov chain is the minimal dynamical system that preserves entropy while destroying memory.

Its essence is encoded in the equation:

P(Xt+1=jXt=i,Xt1,)=P(Xt+1=jXt=i)P(X_{t+1}=j \mid X_t=i, X_{t-1}, \ldots)=P(X_{t+1}=j \mid X_t=i)

The chain collapses the entire past into one number: the current state because that is the least information needed to predict the next step.

The past is irrelevant. Only the present matters.

This makes Markov chains the simplest nontrivial model of temporal evolution under information constraints.

The transition matrix P is a stochastic propagator:

πt+1=πtP\pi_{t+1} = \pi_t P

This is the discrete analogue of the Schrödinger equation, except with probability instead of amplitude.

Irreversibility in Markov chains is is informational and not physical.

πP=π\pi P = \pi

Then π is stationary. Stationarity = the system has lost all memory of its initial condition.

The Components of a Markov Chain

Every Markov chain has three essential components:

  1. State Space (S): The set of all possible states the system can be in.
    • Example: {Vowel, Consonant}
    • Example: {Sunny, Rainy, Cloudy}
    • Example: All possible webpages on the internet
  2. Transition Probabilities (P): The probability of moving from state i to state j.
    • Notation: Pij = P (moving from state i to state j)
    • These must satisfy: j∑_j Pij = 1 (you must go somewhere, probabilities sum to 1)
  3. Initial Distribution (π0π_0): The probability distribution over states at time 0.
    • Where does the chain start?

Long Term behavior and Markov Chain

Long-run behavior depends on graph theory, not probability theory.

Every Markov chain secretly hides:

  • strongly connected components
  • directed acyclic condensation graphs
  • absorbing classes
  • recurrent loops
  • transient basins Long-term fate is determined by reachability, not randomness.

This leads to the invariant: A Markov chain’s future is determined by its topology, not its numbers.