Two-Timescale Simulations
These pages document the simulation suite implemented in the companion repository and summarize what each model contributes.
Working definition. These simulations test how within-lifetime learning and between-generation selection jointly shape cooperation under controlled social interaction structures.
The common architecture is:
- Fast timescale: learning within lifetimes
- Slow timescale: selection across generations
- Shared topology: all three models place agents on a ring network — see Appendix: The ring network for the rationale and per-model neighbor counts.
Model progression
| # | Script | Learning mechanism | Extra social features |
|---|---|---|---|
| 1 | two_timescale_reciprocity.py | Simple trust update (Rescorla-Wagner style) | None |
| 2 | two_timescale_q_learning.py | Q-learning (action-value learning) | None |
| 3 | two_timescale_extended.py | Q-learning | Reputation, partner choice, forgiveness |
Navigate the simulation docs
- Model 1: Trust Learning
- Model 2: Q-learning
- Model 3: Extended (reputation, partner choice, forgiveness)
- Network diversity experiment
- Appendices
Core takeaway
Across all three models, cooperation is not a fixed trait. It is an adaptive outcome that depends on interaction structure, learning dynamics, and selective pressures acting over generations.
For which cooperation mechanisms are included and which are out of scope, see Appendix: Cooperation mechanisms and model scope.
What the theory page predicts — and what these simulations test
The theory page sets out a broader conceptual framework than any single simulation can cover. The table below maps each theoretical concept to its status in this simulation suite.
| Theoretical concept | Status | Notes |
|---|---|---|
| Fast timescale — learning within lifetimes | ✅ Implemented | All three models. Trust update (Model 1), Q-learning (Models 2–3). |
| Slow timescale — selection across generations | ✅ Implemented | All three models. Payoff-proportional reproduction with mutation. |
| Selection on learning parameters | ✅ Implemented | Evolution acts on trust_prior, learning_rate, responsiveness, alpha, epsilon, gamma, initial_q_bias, and social parameters. |
| Fitness landscape smoothing by learning | ✅ Demonstrated | Agents discover cooperation during life, raising their fitness and guiding selection toward cooperation-friendly parameters. |
| Interaction regimes (learning accelerates / masks / opposes evolution) | ⚠️ Partial | The one-shot vs repeated comparison tests the accelerating and masking regimes. The opposing regime (short-term defection winning) appears transiently as invasion events but is not isolated experimentally. |
| Baldwin effect — steps 1 & 2 (plasticity enables cooperation; selection favors learnability) | ✅ Demonstrated | Agents that learn cooperation reproduce more; selection shifts the population toward parameter combinations that make learning succeed faster and more robustly. |
| Baldwin effect — step 3 (genetic assimilation: learned behavior becomes innate) | ❌ Not implemented | Offspring always start with reset memories. Cooperation is never directly encoded in genes — it must be relearned every generation. Assimilation would require heritable memory or a genetically fixed cooperative action. |
| Testable prediction: repeated interaction → higher cooperation than one-shot | ✅ Confirmed | All three models show markedly higher cooperation under repeated interaction. |
| Testable prediction: selection favors partner-discrimination parameters | ✅ Confirmed | responsiveness and rejection_threshold evolve upward under repeated interaction. |
| Testable prediction: reputation mechanisms outperform partner-memory in stranger-rich environments | ✅ Confirmed | Network diversity experiment shows the extended model dominates above ~50% stranger fraction. |
| Testable prediction: trust learning vs Q-learning produce different cooperation–payoff trade-offs | ✅ Confirmed | Trust learning maximises cooperation rate; Q-learning maximises payoff by retaining exploration. |