MARL Environment API¶

Implementation Status

The MARL Environment is fully implemented as a custom environment (non-Gym) that provides direct integration with CARLA and OpenCDA. It manages the full RL loop: observation extraction via ObservationExtractor, action selection via MARLManager, reward calculation, and algorithm updates.

The MARL Environment (MARLEnv) provides a clean, direct interface for multi-agent reinforcement learning, designed specifically for CARLA's dynamic vehicle nature without Gym dependency.

MARLEnv
├── Episode Management      # reset_episode(), step execution
├── Observation System      # ObservationExtractor for feature vectors
├── Reward Calculation      # Multi-objective configurable rewards
├── Event Tracking          # StepEvent-based collision/completion tracking
└── Training Integration    # MARLManager for algorithm updates

Core Classes¶

MARLEnvConstructor

Custom MARL environment with direct CARLA integration.

class MARLEnv:
    """
    Custom MARL Environment for OpenCDA.

    Provides direct RL training loop without Gym dependency.
    Integrates with MARLScenarioManager for simulation control
    and MARLManager for algorithm orchestration.
    """

def __init__(self, scenario_manager: MARLScenarioManager, config: Dict = {}):
    """
    Initialize MARL Environment.

    Parameters
    ----------
    scenario_manager : MARLScenarioManager
        The scenario manager providing CARLA simulation control
    config : dict
        MARL configuration (reward params, training settings, etc.)
    """

Key Methods¶

Episode ManagementObservation SystemReward SystemEvent Tracking

def reset_episode(self):
    """
    Reset for new episode.

    Resets scenario manager, clears event logs,
    and prepares for new training episode.
    """

def step(self):
    """
    Execute one environment step.

    Runs the full RL loop:
    1. Scenario manager step (vehicle control, traffic)
    2. Collect observations via ObservationExtractor
    3. Calculate rewards from step events
    4. MARLManager action selection and algorithm update

    Returns internally managed state; use getter methods
    for observations, rewards, and events.
    """

def get_observations(self) -> Dict:
    """
    Get observations for all active agents.

    Returns observations extracted by ObservationExtractor
    with configurable feature types.

    Returns
    -------
    observations : dict
        Agent observations (agent_id -> observation vector)
    """

def get_training_info(self) -> Dict[str, Any]:
    """
    Get current training information from MARLManager.

    Returns algorithm-specific training stats
    (losses, exploration noise, Q-values, etc.)
    """

def get_current_step_rewards(self) -> Dict[str, float]:
    """
    Get rewards from the current step.

    Returns
    -------
    rewards : dict
        Agent rewards (agent_id -> reward value)
    """

def get_reward_params(self) -> Dict:
    """
    Get current reward function parameters.

    Returns the configurable reward weights and values.
    """

Multi-objective reward components (configurable via YAML):

Component	Default Value	Description
Collision	-500	Terminal penalty on collision
Success	+400	Terminal reward on reaching destination
Step penalty	-0.5	Per-step cost for efficiency
Speed bonus	+1.0	Reward for maintaining target speed
Progress	scaled	Based on distance to destination
Stop penalty	-3.0	Penalty for stopped vehicles
Yielding	+1.0	Reward for yielding to obstacles

def get_current_events(self) -> List:
    """
    Get step events from current simulation step.

    Returns
    -------
    events : list[StepEvent]
        Events including collisions, completions, timeouts
    """

def get_episode_metrics(self) -> Dict:
    """
    Get aggregated episode metrics.

    Returns metrics like total reward, collision count,
    success count, average speed, etc.
    """

Observation Features¶

The ObservationExtractor supports 9 configurable feature types:

Feature	Description
`rel_x`, `rel_y`	Relative position to ego
`heading`	Vehicle orientation (radians)
`speed`	Current velocity
`distance_to_intersection`	Remaining distance to junction
`distance_to_front`	Distance to nearest vehicle ahead
`lane_position`	Lateral position in lane
`waypoint_buffer`	Next waypoint distance
`min_ttc`	Minimum time-to-collision
`distance_to_destination`	Remaining route distance

Usage Examples¶

Basic SetupWith Coordinator (Recommended)Custom Reward Configuration

from opencda_marl.envs.marl_env import MARLEnv

# Created automatically by coordinator, but can be used directly
env = MARLEnv(scenario_manager, config=marl_config)

# Training loop
env.reset_episode()
for step in range(max_steps):
    env.step()

    observations = env.get_observations()
    rewards = env.get_current_step_rewards()
    events = env.get_current_events()

from opencda_marl.coordinator import MARLCoordinator

# Coordinator creates and manages MARLEnv internally
coordinator = MARLCoordinator(config=config)
coordinator.initialize()

# MARLEnv is accessible via coordinator
coordinator.run()  # Handles full training loop

# Configure rewards in YAML
rewards:
  collision_penalty: -500
  success_reward: 400
  step_penalty: -0.5
  speed_bonus: 1.0
  stop_penalty: -3.0
  yielding_bonus: 1.0

Configuration Integration¶

Environment Configuration

# MARL environment settings in config
scenario:
  max_steps: 2400              # Maximum steps per episode
  max_episodes: 500            # Maximum training episodes

MARL:
  algorithm: "td3"             # RL algorithm selection
  state_dim: 8                 # Observation vector dimension
  action_dim: 1                # Action dimension (speed control)
  training: true               # Training vs evaluation mode

Location: opencda_marl/envs/marl_env.py