Skip to main content

Interaction Evolved-Learned Cooperation

A Two-Timescale Theory of Cooperation

Cooperation can emerge through two different adaptive processes:

  • Learning within lifetimes (behavioral plasticity, reinforcement learning)
  • Selection across generations (evolutionary dynamics)

These processes operate on entirely different timescales:

Learning timescale <<< Evolutionary timescale

In natural systems they interact. The two-timescale simulation family in this section provides a controlled framework where both processes can be analyzed together.


Fast and Slow Dynamics

Fast timescale — learning

Agents update their policy during their lifetime to increase expected reward.

πt+1πt+απE[R]\pi_{t+1} \leftarrow \pi_t + \alpha \nabla_\pi \mathbb{E}[R]

Where:

  • ss = social state (partner history, local interaction context),
  • aa = action (cooperate or defect),
  • RR = interaction reward/payoff,
  • α\alpha = learning rate.

In this simulation family, reward is defined by donation-game payoffs and partner-specific interaction outcomes.


Slow timescale — evolution

Population composition changes across generations:

frequency_next = (fitness / mean_fitness) * frequency

Fitness is the payoff accumulated over a lifetime of interactions under the learned policy:

fitness(π)=t=1TRt ⁣(st,π(st))\text{fitness}(\pi^*) = \sum_{t=1}^{T} R_t\!\left(s_t,\, \pi^*(s_t)\right)

Where π\pi^* is the policy the agent has learned by the end of its lifetime and TT is the number of interactions per generation. Agents that learned to cooperate with reliable partners and defect against exploiters accumulate higher payoffs and therefore reproduce more.

Evolution therefore selects based on learning outcomes.


The Baldwin Effect

The Baldwin effect describes how learning changes evolutionary trajectories without requiring inheritance of learned behavior. James Mark Baldwin proposed it in 1896–1897 under the name Organic Selection; George Gaylord Simpson gave it its modern name in 1953. The mechanism is Darwinian throughout — no acquired traits are inherited.

The core insight: an organism that can learn a beneficial behavior survives long enough to reproduce even before its genes encode that behavior directly. Over generations, genetic variants that facilitate the learned behavior accumulate — not because the learned trait is passed on, but because those variants are selected for.

Step 1 — Plasticity enables adaptive behavior
Individuals that can learn cooperative strategies survive and reproduce even when their starting genotype alone would not suffice. Plasticity keeps them viable while genetic variants that support cooperation spread through the population.

Step 2 — Selection favors learnability
Evolution favors traits that:

  • reduce learning cost
  • bias initial behavior toward cooperation
  • increase learning speed
  • improve partner discrimination

Step 3 — Partial genetic assimilation
Cooperation becomes easier or faster to learn and may become partially innate. This is distinct from Waddington's genetic assimilation, where a trait becomes fully encoded and developmentally canalized. The Baldwin effect produces facilitation of learning — the learned behavior becomes cheaper or faster — rather than necessarily replacing it.

Why learning smooths the fitness landscape

Hinton and Nowlan (1987) showed computationally that learning converts a needle-in-a-haystack fitness landscape into a smooth gradient that evolution can climb. Without learning, a cooperative genotype must be nearly complete to provide any fitness advantage. With learning, a partial genotype gets finished within a lifetime and still reproduces — turning a cliff into a slope.

In the cooperation context:

  • Without learning: a genotype predisposed to cooperate has low fitness unless partners are also cooperative, which is rare in a defector-dominated population — cooperative genotypes are eliminated before they can spread.
  • With learning: an agent with even a weak cooperative predisposition can learn to discriminate — cooperating with cooperators, withholding from defectors — and accumulate net positive payoff even in a mixed population.

Learning rescues cooperative genotypes that selection alone would eliminate.

Learning creates new selection pressures

A learned cooperative strategy, once widespread in the population, creates selection pressure for genetic variants that achieve the same behavior at lower cost:

  • lower learning rates suffice when the behavior is already partially encoded
  • initial cooperation biases can be set more aggressively when the social environment has become reliably cooperative
  • discrimination thresholds can loosen as defectors become rarer

This feedback loop — learning expands what is reachable; evolution consolidates what learning discovered — is the core dynamic this simulation family is designed to capture.


Fitness Landscape Interpretation

Without learning:

  • cooperative strategies may have low initial fitness
  • evolution cannot discover them

With learning:

  • agents discover cooperative policies during life
  • these increase reproductive success
  • evolution favors individuals predisposed to those behaviors

Learning smooths the fitness landscape and guides selection.


Interaction Regimes

Learning and evolution can interact in different ways:

  1. Learning accelerates evolution
    Plasticity enables rapid discovery of cooperation that selection stabilizes.

  2. Learning masks selection
    If all agents learn equally well, fitness differences shrink.

  3. Learning opposes evolution
    Short-term learned defection may increase individual reward but reduce population fitness.

  4. Coevolution of learning ability
    Selection may favor faster or more robust learners.


Manifestation in the Simulation Suite

In the integrated two-timescale cooperation simulations:

  • interactions are local by default (ring structure)
  • agents update behavior within generation (trust or Q-values)
  • agents reproduce between generations based on accumulated payoff

This creates a Baldwin-style pathway:

  1. Agents learn partner-contingent cooperation during life
  2. Learners with better long-run payoff leave more offspring
  3. Offspring inherit parameter settings that make successful learning more likely

Cooperation shifts from:

context-dependent learning alone -> learning supported by evolved predispositions


What Can Evolve

Selection can act on:

  • trust predispositions (trust_prior)
  • social responsiveness to experience
  • reinforcement-learning parameters (alpha, epsilon, gamma, bias)
  • social-cognitive parameters (reputation weighting, rejection threshold, forgiveness)

This leads to cooperation-friendly learning phenotypes rather than fixed cooperative strategies.


Testable Predictions

The two-timescale framework generates testable predictions:

  • Populations with repeated interaction should evolve higher cooperation than one-shot regimes.
  • Selection should favor parameter combinations that improve partner discrimination.
  • In stranger-rich environments, reputation-mediated mechanisms should outperform pure partner-memory mechanisms.
  • Different learning rules (trust update vs Q-learning) should produce different cooperation-payoff trade-offs.

All of these are testable within the integrated model family.


Relation to Classical Theories

Classical evolutionary models:

  • fixed strategies
  • cooperation via selection only

Pure reinforcement learning models:

  • cooperation within lifetimes
  • no generational dynamics

This framework unifies both:

Cooperation = f(learning dynamics, evolutionary dynamics)


No single landmark paper fully matches this integrated setup across all dimensions (reciprocal cooperation, local interaction structure, reinforcement learning, and between-generation selection over learning parameters). The most relevant work falls into three overlapping groups: reciprocal altruism theory, network reciprocity theory, and modern multi-agent learning studies of social dilemmas.

Within this project, Model 1 maps most directly to direct reciprocity and network reciprocity theory, Model 2 maps most directly to learned reciprocity in multi-agent reinforcement learning, and Model 3 extends that line toward reputation, partner choice, and socially mediated cooperation with strangers.

WorkClosest axis to this simulation familyMain gap vs this simulation family

Trivers (1971), The Evolution of Reciprocal Altruism

Foundational logic of repeated reciprocal cooperationVerbal evolutionary theory rather than an explicit learning-plus-selection simulation

Axelrod and Hamilton (1981), The Evolution of Cooperation

Repeated-interaction conditions under which reciprocity can stabilize cooperationStrategy tournament framework rather than agents that learn within life and evolve between generations

Nowak (2006), Five Rules for the Evolution of Cooperation

Direct reciprocity, indirect reciprocity, and network reciprocity as a unifying frameworkAnalytic synthesis rather than a concrete parameter-evolution simulation

Ohtsuki et al. (2006), A Simple Rule for the Evolution of Cooperation on Graphs and Social Networks

Why local interaction structure can protect cooperationGraph-theoretic selection result rather than within-lifetime partner learning

Claus and Boutilier (1998), The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems

Foundational emergence of cooperation in multi-agent reinforcement learningSmall, abstract cooperative games with no explicit generational inheritance of learning traits

Eccles et al. (2019), Learning Reciprocity in Complex Sequential Social Dilemmas

Reciprocity under temporal and social complexity in learned agentsNo explicit reproduction-selection loop over inherited social-learning parameters
Display 1: Related work positioned by proximity to this simulation family along key conceptual axes.

Taken together, these works capture the core logic behind the present simulation family: reciprocal altruism, repeated interaction, local network structure, and partner-contingent learning. The distinctive contribution here is that these ingredients are combined in a single two-timescale setup where learning unfolds within life and learning parameters themselves evolve across generations.

Adjacent computational environments

Some broader MARL and artificial-society papers remain relevant as neighboring context, but they are no longer the primary fit for this page's argument:

These works are best understood here as adjacent environment or benchmark context, not as the closest direct precedents for the trust-learning, Q-learning, and extended reciprocity models documented in this section.


Simulation Companion

The concrete two-timescale experiments documented for this site are available in:

That section contains model-by-model results (trust learning, Q-learning, extended social mechanisms), the network-diversity experiment, and focused appendices.


Summary

The interaction between learning and selection:

  • couples fast behavioral adaptation with slow population change
  • enables the Baldwin effect
  • allows plasticity to guide evolution
  • explains how cooperation emerges, stabilizes, or collapses

This interaction forms the core mechanism linking nurture and nature in the integrated two-timescale cooperation simulations.


References