Skip to main content

Two-Timescale Simulations

These pages document the simulation suite implemented in the companion repository and summarize what each model contributes.

Working definition. These simulations test how within-lifetime learning and between-generation selection jointly shape cooperation under controlled social interaction structures.

The common architecture is:

  1. Fast timescale: learning within lifetimes
  2. Slow timescale: selection across generations
  3. Shared topology: all three models place agents on a ring network — see Appendix: The ring network for the rationale and per-model neighbor counts.

Model progression

#ScriptLearning mechanismExtra social features
1two_timescale_reciprocity.pySimple trust update (Rescorla-Wagner style)None
2two_timescale_q_learning.pyQ-learning (action-value learning)None
3two_timescale_extended.pyQ-learningReputation, partner choice, forgiveness
Display 1: Three-model progression in learning and social complexity.

Core takeaway

Across all three models, cooperation is not a fixed trait. It is an adaptive outcome that depends on interaction structure, learning dynamics, and selective pressures acting over generations.

For which cooperation mechanisms are included and which are out of scope, see Appendix: Cooperation mechanisms and model scope.


What the theory page predicts — and what these simulations test

The theory page sets out a broader conceptual framework than any single simulation can cover. The table below maps each theoretical concept to its status in this simulation suite.

Theoretical conceptStatusNotes
Fast timescale — learning within lifetimes✅ ImplementedAll three models. Trust update (Model 1), Q-learning (Models 2–3).
Slow timescale — selection across generations✅ ImplementedAll three models. Payoff-proportional reproduction with mutation.
Selection on learning parameters✅ ImplementedEvolution acts on trust_prior, learning_rate, responsiveness, alpha, epsilon, gamma, initial_q_bias, and social parameters.
Fitness landscape smoothing by learning✅ DemonstratedAgents discover cooperation during life, raising their fitness and guiding selection toward cooperation-friendly parameters.
Interaction regimes (learning accelerates / masks / opposes evolution)⚠️ PartialThe one-shot vs repeated comparison tests the accelerating and masking regimes. The opposing regime (short-term defection winning) appears transiently as invasion events but is not isolated experimentally.
Baldwin effect — steps 1 & 2 (plasticity enables cooperation; selection favors learnability)✅ DemonstratedAgents that learn cooperation reproduce more; selection shifts the population toward parameter combinations that make learning succeed faster and more robustly.
Baldwin effect — step 3 (genetic assimilation: learned behavior becomes innate)❌ Not implementedOffspring always start with reset memories. Cooperation is never directly encoded in genes — it must be relearned every generation. Assimilation would require heritable memory or a genetically fixed cooperative action.
Testable prediction: repeated interaction → higher cooperation than one-shot✅ ConfirmedAll three models show markedly higher cooperation under repeated interaction.
Testable prediction: selection favors partner-discrimination parameters✅ Confirmedresponsiveness and rejection_threshold evolve upward under repeated interaction.
Testable prediction: reputation mechanisms outperform partner-memory in stranger-rich environments✅ ConfirmedNetwork diversity experiment shows the extended model dominates above ~50% stranger fraction.
Testable prediction: trust learning vs Q-learning produce different cooperation–payoff trade-offs✅ ConfirmedTrust learning maximises cooperation rate; Q-learning maximises payoff by retaining exploration.
Display 2: Theory–simulation correspondence.