Skip to main content

To do

Nature v Nurture definitions

  • what is nurture ? Pure "self nurtured" or "man made nurtured" or "nature nurtured? If someone is born near the equator in Africa; is that nurture? Is the behavior of ancestors nurture or nature? Is a physical inheritance nurture or nature?

addition to Prisoner's Dilemma

Stag hunt matrix? To learn cooperation.

Why do humans cooperate?

The surface answers are many:

  • survival,
  • reciprocity,
  • empathy,
  • norms,
  • reputation,
  • laws,
  • morality.

These are important mechanisms, but they can be reduced to a smaller set of structural reasons.

Having options makes people happy

  • Changing (options) seasons makes people happie than peoplewith fixed climate?? This implies relation between equator distance and happiness.

Interdependence of outcomes

Cooperation becomes rational when payoffs are coupled and agents cannot optimize fully on their own.

Examples include:

  • shared resources,
  • division of labor,
  • ecological feedback loops,
  • public goods,
  • tasks that exceed solo capacity.

Temporal extension

Cooperation becomes more likely when interactions repeat over time. Short-term sacrifice can produce long-term gain through:

  • reciprocity,
  • trust,
  • reputation,
  • learning,
  • cultural transmission.

Internalization of group structure

Humans often carry social regulation inside the individual through:

  • empathy,
  • guilt,
  • shame,
  • norms,
  • identity.

This helps explain why cooperation can persist even when direct monitoring or immediate reward is weak.

One-sentence synthesis

Cooperation emerges when independent optimization breaks down, the future matters, and social coordination becomes internalized.

Why this matters here

For this site, the central question is not whether cooperation exists, but how it fits into a broader theory of human behavior and under what minimal conditions it emerges:

  • through learning within a lifetime,
  • through selection across generations,
  • and through the interaction between those two timescales.

That is why cooperation sits near the center of the project. It is not the whole of human behavior, but it is one of the clearest cases in which behavior cannot be understood at the level of the isolated individual alone.

Brainstorming

  • Use Leary's Rose in Learned Cooperation?
  • "The Inevitablity of Selfishness"
  • "Cooperation is not trivial. Competition is intuitivly more sensible due to the inevitability of selfishness"

[X]Cooperation by bundled forces of predators

  • Predators can eat if the predators enters the Moore neighborhood of a Prey
  • It can only eat if it has a higher energy level than the Prey
  • If more Predators ar in the Moore neighborhood, it can eat the Prey only if their cumulative energy is greater or equal than the Prey. If so: the divide the energy proportionally to their own energy.
  • "When you can't do it alone you must do it together"

Similarties "Nature" and "Nurture"

  • maybe not so different
  • The natural selection of life time learning
  • diminshing returns on the happy behaviors (learning is also open-ended like evolution is)
  • reward system is adaptivre like evolution

Differences

  • "nature" is very binary: survival/reproduction, "Nurture" is more continuous and less fatal.

repo tit-for-tat

  • iterative tit-for-tat
  • always end defecting if finite ending
  • MARL training with random ending periods (max_steps), might result in cooperating

[ ] Direct reciprocity without coordination under necessity

Goal:

  • Remove "we must cooperate or we cannot kill the prey" completely.
  • Study whether predators learn to help because help is returned later, not because a kill is impossible alone.

Recommended environment concept:

  • Start from a rabbits-only or shared_prey style environment, not mammoths.
  • Every prey is individually catchable by one predator.
  • Reproduction remains the only learning reward, so the setup stays aligned with the rest of PredPreyGrass.

Cooperative act:

  • After a successful solo kill, the capturing predator can choose share_food = 0/1.
  • If share_food = 1 and another predator is within Moore neighborhood:
  • A fixed fraction of prey energy is transferred to one nearby predator.
  • The sharer keeps the remainder and is immediately worse off than under selfish consumption.
  • Sharing is therefore voluntary and immediately costly.

Alternative cooperative act:

  • assist_hunt = 0/1 for a nearby predator that is chasing prey.
  • Assistance lowers the target predator's hunting cost or raises its capture chance.
  • Assistance is never required for capture, only beneficial.

Direct reciprocity mechanism:

  • Each predator keeps private memory of specific partners, not public reputation.
  • Example memory variable: trust[i][j] = how much predator i expects predator j to return favors.
  • Increase trust[i][j] when j shared with or assisted i.
  • Decrease trust[i][j] when j refused to share or help in a relevant opportunity.
  • Let trust slowly decay back toward neutral so reciprocity must be maintained.

Observation / state:

  • Standard spatial observation stays intact.
  • Add one extra private observation signal for predators only:
  • At nearby predator positions, encode focal-agent trust toward that predator.
  • Or provide a compact summary such as nearest-partner trust / mean nearby trust.
  • Do not expose a public reputation score; otherwise the mechanism shifts toward indirect reciprocity.

Why this is no longer necessity:

  • A predator can always eat alone.
  • Cooperation now means giving up immediate energy for another predator.
  • The only reason to do this is expectation of future return through repeated interaction.

Core experimental conditions:

  • Baseline selfish condition: no memory, no partner-specific trust signal.
  • Direct reciprocity condition: private partner memory enabled.
  • Identity-shuffle ablation: same reciprocity logic, but predator identities are randomly remapped each episode.
  • Optional indirect reciprocity comparison: public reputation signal instead of private pairwise memory.

Ecological settings that make direct reciprocity testable:

  • Spawn offspring near parents so the same predators meet repeatedly.
  • Keep movement costs and energy decay moderate so repeated interaction matters.
  • Keep prey abundant enough that sharing is feasible, but not so abundant that social help is irrelevant.
  • Keep lifetimes long enough for remembered favors to be returned.

What should emerge if direct reciprocity is real:

  • Predators share or assist reliable partners more than unreliable partners.
  • Predators reduce helping after a partner failed to reciprocate.
  • Cooperation is stronger with partner memory than without it.
  • Cooperation collapses or weakens strongly when identities are shuffled.

Minimal metrics:

  • P(share | partner shared with me before)
  • P(share | partner did not share with me before)
  • P(assist | partner assisted me before)
  • Mean energy transferred per dyad over time
  • Share/assist rate for familiar partners versus unfamiliar partners
  • Change in helping probability after partner defection
  • Reproduction rate under baseline vs reciprocity vs identity-shuffle

Interpretation:

  • If helping rises only when partner-specific memory is available, then cooperation is no longer explained by immediate ecological necessity.
  • It is explained by expected future return from repeated interaction: direct reciprocity.

[ ] Mixed -Stah Hunt

Macro-level energy

  • Add it to the file which is already in place: energy_by_type.json (created by evaluate_......_debug.py)
  • Substract cumulative decay energy Predator and Prey per step (homeostatic energy)
  • Add cumulative photosynthesis energy from grass

layered cooperationn in SocialBehavior

  • Marl Book example
  • Display Maslow's Pyramid and describe project from bottom to top:
  • First layer: PredatorPreyGrass project. Typical for the first layer, physical need (eeating), survival an reproduction.
  • Second layer: social needs. Need to corporate

Dynamic training

  • Create training algorithm of competing policies and select 'winner' after each iteration/a number of iterations. Competing policies have different environment configs. Goal: optimize environment parameters more efficiently and automatically at run time rather than manually after full (10 hour) experiments. Determine success:

    • fitness metrics
    • ability to co-adapt
  • curriculum reward tuning

Examples to try out

Environment enhancements

  • Male & Female reproduction instead of asexual reproduction

  • Build wall or move wall

  • Adding water/rivers

    Experiments

  • Tuning hyperparameters and env parameters simultaneously (see chat)

  • max_steps_per_episode: For policy learning performance: 500–2000 steps per episode is a common sweet spot in multi-agent RL — long enough for interactions to unfold, short enough for PPO to assign credit.

    For open-ended co-evolution (your case): you might intentionally want longer episodes (e.g. 2000–5000) so emergent dynamics have time to play out, even if training is slower.

    A good trick is to curriculum the horizon:

    Start short (e.g. 500–1000) → agents learn basic survival.

    Gradually increase (e.g. +500 every N iterations) → expose them to longer ecological timescales.

    “works-in-practice” plan for your PredPreyGrass run, plus what to tweak as you lengthen episodes.

    Start shorter for stability/throughput, then stretch to let eco-dynamics (booms, busts, Red-Queen) unfold.

    Phase A (bootstrap)

    • max_steps = 1_000
    • gamma = 0.995 (effective credit horizon ≈ 1/(1−γ) ≈ 200 steps)
    • lambda_ (GAE) = 0.95–0.97

    Phase B (mid)

    • max_steps = 2_000–3_000
    • gamma = 0.997–0.998 (horizon ≈ 333–500)
    • lambda_ = 0.96–0.97

    Phase C (long-term dynamics)

    • max_steps = 4_000–5_000
    • gamma = 0.998–0.999 (horizon ≈ 500–1 000)
    • lambda_ = 0.97

    Why that mapping? PPO’s useful credit horizon is ~1/(1−γ). As you increase max_steps, you raise γ so actions can “see” far enough ahead without making variance explode.

    Batch/throughput knobs to adjust as episodes get longer

    Keep ~4–10 episodes per PPO iteration so you still get decent reset diversity:

    • train_batch_size: roughly episodes_per_iter × max_steps. Example: at max_steps=1_000, use 8_000–16_000. When you move to max_steps=3_000, bump toward 24_000–48_000.
    • rollout_fragment_length: increase with horizon so GAE has longer contiguous fragments (e.g., 200 → 400 → 800).
    • num_envs_per_env_runner: raise a bit as episodes lengthen to maintain sampler throughput.
    • KL/clip: leave defaults unless you see instability; longer horizons often benefit from slightly smaller learning rate rather than big clip/kl changes.

    When to stop stretching episodes

    • If timing/iter_minutes balloons or TensorBoard curves update too slowly, hold the current max_steps for a while.
    • If you see extinction before the cap, longer episodes won’t help—tune ecology (e.g., energy gains/losses) instead.

    Make available the BHP archive in a repository

    LT-goal acquire more wealth as a population

    • Energy as a proxy of wealth
    • Only the top 10% of energy reproduces?
    • Escaping the Malthusian trap

    Integrate Dynamic Field Theory

    • Wrapper around brain
    • Visualize first!!!

    Posting on the linkedin The Behavior Patterns Project?

    The Malthusian Trap in a Predator–Prey Co-Evolutionary System

    • Limit population size of predators or prey; is that beneficial compared to unbounded reproduction?
    • [2.5]: experiment_1 / experiment are powerfull examples of the Malthusian trap. Record this on site. Create a "Malthusian Trap"

Pranjal 2-12-2025

  • Communication: leave ant trace (ant colony/lenia), also keep previous state in Observation?
  • www.talkRL.com
  • Reshape Field of Vision for Predators; only into the direction of moiving? In that way Prey can hide more easily from Predators?
  • Is the existence of a prolonged episode betweeen predators and prey not an emergence of cooperation?

Research shortlist: evolution + birth/death + MARL