Implementing Behavioral Evolution Strategies in PredPreyGrass
Cooperative vs Opportunistic Predator Policies (Codex Context File)
This document is meant to be used directly as context for Codex in VS Code.
Place it in your repository (e.g. docs/behavioral_strategies.md) and instruct Codex
to use it as implementation guidance.
It intentionally contains only actionable design, not theory.
Goal
We want to test behavioral evolution.
- All predators have identical morphology and life-history
- Differences are purely behavioral
- Strategies are implemented as distinct policies
- Strategy identity is preserved on reproduction
Strategy A: Cooperative Hunter
Behavioral intent
A cooperative hunter:
- waits for nearby predators before attacking large prey
- approaches large prey together
- avoids attacking large prey alone if success probability is low
- rallies toward predator clusters when alone
This strategy is coordination-first.
Decision logic (conceptual)
- Detect nearby predators in Moore neighborhood
- Detect nearby prey (distinguish large vs small if applicable)
- If large prey is adjacent or reachable:
- attack only if enough predators are nearby
- If large prey is visible but cooperation condition is not met:
- move toward nearest predator / predator cluster
- Otherwise:
- forage opportunistically or patrol
Pseudocode
def cooperative_policy(obs):
predators_near = count_predators(obs, radius=1)
large_prey_dir = nearest_large_prey(obs)
small_prey_dir = nearest_small_prey(obs)
if large_prey_dir is not None:
if predators_near >= 2:
return move(large_prey_dir) # coordinated attack
else:
return move(toward_predator_cluster(obs)) or stay()
if small_prey_dir is not None:
return move(small_prey_dir)
return move(toward_predator_cluster(obs)) or random_walk()
Strategy B: Opportunistic Defector
Behavioral intent
An opportunistic defector:
- attacks whenever prey is nearby
- ignores coordination cues
- exploits situations where others soften prey
- shadows predator clusters to steal opportunities
This strategy is greedy and myopic.
Decision logic (conceptual)
- If any prey is adjacent or reachable:
- attack immediately
- If prey is visible:
- chase prey even if alone
- If no prey visible:
- shadow predator clusters or patrol randomly
Pseudocode
def opportunistic_policy(obs):
prey_dir = nearest_any_prey(obs)
if prey_dir is not None:
return move(prey_dir) # always commit
return move(toward_predator_cluster(obs)) or random_walk()
Implementation Notes (PredPreyGrass-specific)
1. Actions
- Predators typically "attack" by moving onto prey cells
- Avoiding attack = avoiding stepping onto prey when alone
No new actions are required.
2. Observations required
The policy assumes access to:
- predator positions (same type)
- prey positions (distinguish large vs small if needed)
- local neighborhood geometry
These are already present in PredPreyGrass observation channels.
3. Strategy identity (policy identity)
Each predator must have a strategy label, e.g.:
type_1_predator_A_0type_1_predator_B_7
Mapping rule:
A→ cooperative policyB→ opportunistic policy
4. Policy mapping (RLlib)
policy_mapping_fn must map agent IDs to different policies:
def policy_mapping_fn(agent_id, *args, **kwargs):
# Example agent_id: type_1_predator_A_3
parts = agent_id.split("_")
type_ = parts[1]
role = parts[2]
allele = parts[3]
return f"type_{type_}_{role}_{allele}"
5. Reproduction (inherit strategy)
When a predator reproduces:
- child inherits parent strategy label (A or B)
- no mutation required for baseline test
Example:
type_1_predator_A_5 → child: type_1_predator_A_12
This is essential for behavioral evolution.
How to use this file with Codex
-
Save this file as:
docs/behavioral_strategies.md -
In VS Code, open Codex and prompt:
Use docs/behavioral_strategies.md as context.
Implement the cooperative and opportunistic predator strategies
as scripted baseline controllers in the PredPreyGrass environment.
Do not change morphology or env mechanics. -
Optionally ask Codex to:
- convert scripted policies into PPO-initialized policies
- add logging of strategy frequencies over time
Success criterion
You have implemented behavioral evolution if:
- cooperative and opportunistic predators coexist initially
- reproduction changes their relative frequencies
- extinction or coexistence occurs without changing body parameters
Scope note
This document intentionally avoids:
- reward shaping
- architectural changes
- evolutionary mutation
- curriculum learning
It defines the minimal, clean behavioral-evolution test case.