Research Project

Meta-Learning RL System for Non-Stationary Environments

A reinforcement learning system designed to stay useful after the environment changes, not just while it stays familiar.

This work focused on combining dual-agent PPO training with neuromodulation-inspired adaptation so the policy could recover quickly after distribution shifts while resisting catastrophic forgetting over repeated task transitions.

Read Research Paper View GitHub Return to Portfolio

Paper Preview

Open the research paper.

This preview links directly to the paper covering the training setup, adaptation strategy, and measured recovery results.

open research paper offline live

Why This Matters

Most RL systems break when the world changes.

Non-stationary settings are where a lot of promising RL work falls apart. The goal here was to make policy adaptation a first-class behavior rather than an afterthought, so the agent could recover under changing conditions without resetting to square one.

Problem

Standard policies often overfit to a single training regime and lose performance sharply after distribution shifts or repeated task changes.

Approach

I structured the system around coordinated PPO agents with adaptation signals inspired by neuromodulation, allowing the policy to respond more fluidly to changing environments.

Target Outcome

Preserve useful prior behavior while recovering quickly enough to stay competitive after each environmental shift.

System Design

Adaptation was built into the control loop.

The system centered on three design choices that made the adaptation loop more resilient under repeated distribution shifts.

Dual-Agent PPO

Separate policy responsibilities created a cleaner boundary between stable control behavior and fast adaptation behavior.

Neuromodulation Signals

Modulatory signals were used to help the policy respond to changing conditions without fully overwriting previously learned behavior.

Forgetting Mitigation

Evaluation emphasized whether the system could recover across repeated shifts, not just succeed once in a favorable setting.

Research Link

Paper and highlights

Direct access to the paper behind the project.
Highlights catastrophic forgetting mitigation in non-stationary RL.
Documents the measured jump from a 0.46 baseline to 0.73-0.75 success.
Captures the repeated 80-95% recovery behavior after environmental shifts.

Technical Paper

Combating Catastrophic Forgetting

The full paper captures the training setup, evaluation methodology, and adaptation results behind the case study.

Open PDF GitHub Repo