Version 1.0.0 Changelog¶
Release Date: January 2026 Status: Stable Release Theme: Complete MARL Framework
Major Features¶
MARL Environment System¶
- Observation System: Configurable feature extraction via ObservationExtractor
- Reward Calculation: Multi-objective rewards (collision, success, progress, safety, speed)
- Termination Logic: Episode ending based on collision, completion, or timeout
- Evaluation Metrics: Cross-agent performance comparison
- CARLA Integration: Direct connection without Gym dependency
- SUMO Mode: Lightweight traffic-only simulation via SumoMarlEnv
Multi-Agent Framework¶
# Agent Factory - Centralized agent creation
from opencda_marl.core.agents.agent_factory import AgentFactory
# Five implemented agent types
behavior_agent = AgentFactory.create_agent("behavior", config)
vanilla_agent = AgentFactory.create_agent("vanilla", config)
rule_based_agent = AgentFactory.create_agent("rule_based", config)
marl_agent = AgentFactory.create_agent("marl", config)
- MARLAgent: RL-controlled speed, local planner handles steering. Returns (speed, location)
- Behavior Agent: Simplified OpenCDA behavior cloning with route following
- Vanilla Agent: Enhanced safety with multi-vehicle TTC tracking
- Rule-based Agent: 3-stage intersection navigation (junction → following → cruising)
- Basic Agent: Full autonomous driving with traffic light and obstacle detection
- Vehicle Adapters: Bridge OpenCDA VehicleManager with MARL agent control
RL Algorithm Suite¶
# TD3 - Continuous control with LSTM encoder
MARL:
algorithm: "td3"
state_dim: 8
action_dim: 1
td3:
learning_rate_actor: 0.0001
learning_rate_critic: 0.001
exploration_noise: 0.3
noise_decay: 0.998
min_noise: 0.05
warmup_steps: 1000
lstm_hidden: 256
Key features: LSTM multi-agent context encoding, LayerNorm before tanh, delayed policy updates, prioritized experience replay (optional), smart replay buffer with recency bias.
Training Infrastructure¶
# MARLManager orchestrates the active algorithm
from opencda_marl.core.marl.marl_manager import MARLManager
manager = MARLManager(config)
action = manager.select_action(observations, ego_id, training=True)
manager.store_transition(obs, ego_id, action, reward, next_obs, done)
losses = manager.update()
# CheckpointManager - Structured model saving
from opencda_marl.core.marl.checkpoint import CheckpointManager
checkpoint_mgr = CheckpointManager(config)
checkpoint_mgr.save(algorithm, episode, reward) # latest + best + per-episode
checkpoint_mgr.load(algorithm, mode="best") # load best model
# TrainingMetrics - Episode statistics with CSV export
from opencda_marl.core.marl.metrics import TrainingMetrics
metrics = TrainingMetrics(config)
metrics.update(episode_data)
metrics.export_csv() # Export to metrics_history/
- SmartReplayBuffer: Pre-allocated numpy arrays, O(1) push/sample, recency bias (50% recent + 50% diverse)
- PrioritizedReplayBuffer: TD-error weighted sampling, importance sampling with beta annealing
- RolloutBuffer: On-policy buffer for MAPPO with GAE computation
Automatic convergence detection based on:
- Coefficient of variation (CV) < 15% over rolling window of 10 episodes
- Success rate stability (CV < 20%)
- Collision rate improving (second half ≤ first half × 1.1)
- Minimum 20 episodes before checking
GUI Dashboard System¶
- Main Dashboard: Central control interface with PySide6 Qt widgets
- Observation Viewer: Real-time agent state visualization
- Step Controller: Manual simulation stepping and episode management
- Widget Panels: Agent observation, environment, metrics, reward, system, traffic, weather
Traffic Management System¶
- Record: Record actual simulation vehicle behavior to JSON/HDF5
- Replay: Replay pre-recorded traffic patterns for reproducibility
- Live: Generate traffic on-the-fly using flow configuration
Technical Implementation¶
Observation System¶
The ObservationExtractor supports 9 configurable feature types:
| Feature | Description |
|---|---|
rel_x, rel_y |
Relative position to ego |
heading |
Vehicle orientation (radians) |
speed |
Current velocity |
distance_to_intersection |
Remaining distance to junction |
distance_to_front |
Distance to nearest vehicle ahead |
lane_position |
Lateral position in lane |
waypoint_buffer |
Next waypoint distance |
min_ttc |
Minimum time-to-collision |
distance_to_destination |
Remaining route distance |
Reward System¶
Multi-objective rewards configurable via YAML:
| Component | Default Value | Description |
|---|---|---|
| Collision | -500 | Terminal penalty on collision |
| Success | +400 | Terminal reward on reaching destination |
| Step penalty | -0.5 | Per-step cost to encourage efficiency |
| Speed bonus | +1.0 | Reward for maintaining target speed |
| Progress | scaled | Based on distance to destination |
| Stop penalty | -3.0 | Penalty for stopped vehicles |
| Yielding bonus | +1.0 | Reward for yielding to obstacles |
TensorBoard Logging¶
Comprehensive training metrics logged to TensorBoard:
- Losses: Critic loss, actor loss
- Q-values: Q1 mean, Q2 mean
- Gradients: Pre-clip norms for critic and actor
- Exploration: Noise level over time
- Learning: Reward moving average, coefficient of variation
- Safety: Near-miss count, TTC violation rate
- Traffic: Average speed, speed gap, throughput
Dependencies¶
| Package | Version | Purpose |
|---|---|---|
omegaconf |
2.3+ | Configuration management |
loguru |
0.7+ | Enhanced logging |
mkdocs-material |
9.5+ | Documentation theme |
torch |
2.0+ | Deep learning framework |
numpy |
1.24+ | Numerical computing |
pyside6 |
6.0+ | GUI framework |
tensorboard |
2.0+ | Training visualization |
Configuration Schema¶
# Base MARL configuration structure (configs/marl/default.yaml)
meta:
simulator: "carla" # or "sumo"
world:
sync_mode: true
client_port: 2000
fixed_delta_seconds: 0.05
scenario:
max_steps: 2400
max_episodes: 500
traffic:
mode: "replay"
replay_file: "recordings/lite_2minL.json"
base_speed: 45.0
MARL:
algorithm: "td3" # td3, dqn, q_learning, mappo, sac
state_dim: 8
action_dim: 1
training: true
agents:
agent_type: "marl" # marl, vanilla, behavior, rule_based
tensorboard:
enabled: true
log_dir: "runs"
world_reset:
enabled: true
interval_episodes: 50
API Changes¶
New Classes¶
class BaseAlgorithm(ABC):
"""Abstract base for all RL algorithms"""
def select_action(self, state, training) -> action
def store_transition(self, state, action, reward, next_state, done)
def update(self) -> Dict[str, float]
def reset_episode(self)
def get_training_info(self) -> Dict
def save(self, path)
def load(self, path)