Theory of Learning-Selection Interaction

A Two-Timescale Theory of Cooperation

Cooperation can emerge through two different adaptive processes:

Learning within lifetimes (behavioral plasticity, reinforcement learning)
Selection across generations (evolutionary dynamics)

These processes operate on entirely different timescales:

Learning timescale <<< Evolutionary timescale

In natural systems they interact. PredPreyGrass provides a framework in which both occur in the same ecological environment.

Fast and Slow Dynamics

Fast timescale — learning

Agents update their policy during their lifetime to increase expected reward.

\pi_{t+1} \leftarrow \pi_t + \alpha \nabla_\pi \mathbb{E}[R]

Where:

$s$ = state (local ecological context),
$a$ = action (e.g., hunt, wait, share space),
$R$ = reward (in PredPreyGrass: reproduction),
$\alpha$ = learning rate.

Reward is determined by ecological and social interaction.
In PredPreyGrass: reproduction is the only reward.

Slow timescale — evolution

Population composition changes across generations:

frequency_next = (fitness / mean_fitness) * frequency

Fitness depends on learned behavior:

fitness = f(learned_policy)

Evolution therefore selects based on learning outcomes.

The Baldwin Effect

The Baldwin effect describes how learning changes evolutionary trajectories without requiring inheritance of learned behavior.

Step 1 — Plasticity enables adaptive behavior
Individuals that can learn cooperative strategies reproduce more.

Step 2 — Selection favors learnability
Evolution favors traits that:

reduce learning cost
bias initial behavior toward cooperation
increase learning speed

Step 3 — Partial genetic assimilation
Cooperation becomes easier or faster to learn and may become partially innate.

Learning reshapes the fitness landscape by making cooperative strategies reachable.

Fitness Landscape Interpretation

Without learning:

cooperative strategies may have low initial fitness
evolution cannot discover them

With learning:

agents discover cooperative policies during life
these increase reproductive success
evolution favors individuals predisposed to those behaviors

Learning smooths the fitness landscape and guides selection.

Interaction Regimes

Learning and evolution can interact in different ways:

Learning accelerates evolution
Plasticity enables rapid discovery of cooperation that selection stabilizes.
Learning masks selection
If all agents learn equally well, fitness differences shrink.
Learning opposes evolution
Short-term learned defection may increase individual reward but reduce population fitness.
Coevolution of learning ability
Selection may favor faster or more robust learners.

Manifestation in PredPreyGrass

In PredPreyGrass:

reward = reproduction
cooperation increases capture efficiency
learned coordination increases lifetime fitness

This creates a Baldwin pathway:

Predators learn group hunting
Group hunters reproduce more
Offspring inherit traits that improve coordination conditions
(e.g. speed ratios, spatial proximity tendencies, reduced interference)

Cooperation shifts from:

purely learned → facilitated by inherited traits

What Can Evolve

Selection can act on:

morphological traits (speed, vision, energy capacity)
learning parameters
initial policy biases
social attraction or avoidance tendencies

This leads to cooperation-friendly phenotypes rather than fixed cooperative strategies.

Testable Predictions

The two-timescale framework generates testable predictions:

Populations with learning evolve cooperation faster than populations without learning.
Disabling learning after evolution reveals partially innate cooperation.
High plasticity reduces selection gradients.
Low plasticity increases evolutionary pressure on morphology.

All of these can be tested in PredPreyGrass.

Relation to Classical Theories

Classical evolutionary models:

fixed strategies
cooperation via selection only

Pure reinforcement learning models:

cooperation within lifetimes
no generational dynamics

This framework unifies both:

Cooperation = f(learning dynamics, evolutionary dynamics)

No single landmark paper fully matches PredPreyGrass across all dimensions (sequential MARL, ecology, cooperation, population pressure, and learning-selection coupling). The closest lines of work are:

Work	Closest axis to PPG	Main gap vs PPG
Claus and Boutilier (1998), The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems	Foundational emergence of cooperation in MARL	Small, abstract cooperative games; no ecological population dynamics
Leibo et al. (2017), Multi-agent Reinforcement Learning in Sequential Social Dilemmas	Sequential social dilemmas and emergent cooperation	Limited evolutionary/selection dynamics
Hughes et al. (2018), Inequity Aversion Improves Cooperation in Intertemporal Social Dilemmas	Mechanisms that stabilize cooperation in sequential settings	Focus on social preferences, not ecology-selection coupling
Eccles et al. (2019), Learning Reciprocity in Complex Sequential Social Dilemmas	Reciprocity under temporal and social complexity	No explicit ecological reproduction-selection loop
Leibo et al. (2018), Malthusian Reinforcement Learning	Population pressure and ecology-linked MARL adaptation	Less focused on explicit cooperative hunting ecology
Zheng et al. (2018), MAgent	Large-scale many-agent ecological-like environments	Benchmark platform, not a specific two-timescale cooperation theory
Suarez et al. (2019), Neural MMO	Persistent multi-agent worlds with resource pressure and emergent roles	Different task framing; weaker explicit learning-selection theory framing
Leibo et al. (2021), Melting Pot	Broad evaluation of social behaviors in MARL	Evaluation suite rather than a single ecological cooperation model

Taken together, these works bracket PPG's design space. PPG is closest to their intersection, rather than to any one benchmark or theory.

Summary

The interaction between learning and selection:

couples fast behavioral adaptation with slow population change
enables the Baldwin effect
allows plasticity to guide evolution
explains how cooperation emerges, stabilizes, or collapses

This interaction forms the core mechanism linking nurture and nature in PredPreyGrass.

A Two-Timescale Theory of Cooperation​

Fast and Slow Dynamics​

Fast timescale — learning​

Slow timescale — evolution​

The Baldwin Effect​

Fitness Landscape Interpretation​

Interaction Regimes​

Manifestation in PredPreyGrass​

What Can Evolve​

Testable Predictions​

Relation to Classical Theories​

Related Work (Closest by Axis)​

Summary​