Iterated Prisoner's Dilemma: From Myopia to Reciprocity

When the Prisoner's Dilemma is repeated, strategy is no longer a single move. It becomes a policy over time: "what to do now given what happened before."

What Changes When the Game Is Repeated

Future consequences matter ("shadow of the future")
Reputation and retaliation become possible
Conditional cooperation strategies can emerge (for example, reciprocity)

Finite vs Ongoing Interaction

Finite known horizon: backward-induction logic pushes toward end-game defection, often cascading earlier.
Uncertain or effectively ongoing horizon: cooperation can be sustained if future interaction is valuable enough.

This tension is exactly why repeated social dilemmas are useful for studying learned cooperation.

Why MARL Is the Bridge

In practice, agents do not compute perfect equilibria analytically. They learn from interaction.

Multi-Agent Reinforcement Learning (MARL) lets us test whether cooperative conventions can emerge from repeated play under bounded learning dynamics, exploration, and non-stationarity.

Next step: Bridge via MARL in the repeated Prisonners Dilemma

What Changes When the Game Is Repeated​

Finite vs Ongoing Interaction​

Why MARL Is the Bridge​

What Changes When the Game Is Repeated

Finite vs Ongoing Interaction

Why MARL Is the Bridge