Skip to main content

Appendices

Ecological realism of benefit > cost

Applies to: all three models.

The donation game uses benefit = 3.0 and cost = 1.0. This does not imply free energy. It captures settings where social coordination or information transfer produces synergistic gains.

For pure resource transfer, conservation constraints imply b <= c. But for alarm signaling, collective defense, and division of labor, effective social benefit can exceed individual cost.

The ring network

Applies to: all three models. Neighbor count differs per model (see table below).

All three models are embedded in a ring topology with repeated local encounters.

ModelNeighbors per agent
Model 18 (4 left, 4 right)
Model 22 (left and right)
Model 32 (left and right)
Display 1: Ring-neighborhood sizes across the three simulation models.
Ring network diagram
Display 2: Ring-lattice visualization used to communicate local repeated interactions.

The ring is used because it provides repeated local interaction while minimizing additional geometric effects that are stronger on 2D lattices.

Why compare one-shot and repeated interaction?

Applies to: Model 1 only. Models 2 and 3 run repeated interaction only.

Model 1 runs both:

  • lifetime_rounds = 1 (one-shot dominant)
  • lifetime_rounds = 80 (repeated interaction)

This isolates whether cooperation emerges from actual partner-history learning or from static predispositions alone.

Cooperation mechanisms and model scope

Applies to: all three models (with differences noted below).

Included

  • Direct reciprocity (partner-specific learning)
  • Network reciprocity (local repeated interaction)

Out of scope

Kin selection

Agents do not know who their relatives are. Kin selection is not implemented in any model.

Population-wide indirect reciprocity

Included in Model 3 (local form): Agents can observe a partner's reputation score and adjust behavior accordingly.

Not included (population-wide form): Reputation does not spread across the entire population; only local observation.

Group selection

Groups do not reproduce or die as units. All selection acts on individual payoff.

Strategic and psychological interpretation

Applies to: Models 1 and 2 (direct comparison); Model 3 noted where relevant.

Trust-learning tends toward high cooperation rates in repeated settings, but can be exploitable.

Q-learning tends to cooperate less often while earning more by preserving strategic selectivity and accounting for future relationship value.

The broader interpretation is that adaptive human cooperation resembles selective, future-oriented reciprocity rather than unconditional cooperation.

Rescorla–Wagner style learning

Applies to: Model 1 only.

Model 1 (two_timescale_reciprocity.py) describes its trust update as "Rescorla–Wagner style". This appendix explains what that means.

The Rescorla–Wagner model

The Rescorla–Wagner model (1972) is a mathematical rule for classical conditioning: it describes how the strength of a learned association changes after each trial.

The core update rule is:

ΔV=αβ(λV)\Delta V = \alpha \beta (\lambda - V)

Where:

SymbolMeaning
VVCurrent associative strength (the learned prediction)
λ\lambdaMaximum possible conditioning (the actual outcome)
(λV)(\lambda - V)Prediction error — how surprised the learner is
α\alphaSalience of the conditioned stimulus (learning rate)
β\betaSalience of the unconditioned stimulus (learning rate)
Display 2: Rescorla–Wagner model notation: symbols and their meanings.

Key insight: learning only occurs when the outcome is unexpected. If V=λV = \lambda, the prediction error is zero and the association does not change. Surprise drives learning; confirmation does not.

How this maps onto Model 1

In Model 1, the trust update is:

learned_trust[i, j] += alpha_i * (target_for_i - learned_trust[i, j])

This is structurally identical to the Rescorla–Wagner rule:

Model 1 termRescorla–Wagner equivalent
learned_trust[i, j]VV — current learned prediction
target_for_i (+1 or −1)λ\lambda — actual observed outcome
target - learned_trust(λV)(\lambda - V) — prediction error
alpha_iαβ\alpha\beta — learning rate
Display 3: Mapping between Rescorla–Wagner terms and their Model 1 (Trust Learning) equivalents.

The agent updates its trust in partner j in proportion to how surprised it was by j's behavior. If the agent already expected cooperation and got it, trust barely moves. If the agent was betrayed unexpectedly, trust drops sharply.

Relationship to reinforcement learning

The Rescorla–Wagner rule is the conceptual ancestor of the TD (temporal-difference) prediction error used in modern reinforcement learning:

QQ+α(rQ)Q \leftarrow Q + \alpha\,(r - Q)

The key difference is that Rescorla–Wagner describes learning about a stimulus (what to expect from a partner), whereas Q-learning describes learning about actions (what to do). Model 1 uses the simpler, stimulus-learning form; Models 2 and 3 upgrade to full action-value learning.

Reference

Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). Appleton-Century-Crofts.