RL Pipeline Overview ==================== The WebGym RL pipeline consists of several interconnected components located in ``webgym/``. .. image:: ../../figures/rl_arch.png :alt: RL Pipeline Overview :align: center Component Summary ----------------- **Context Management** (``webgym/context/``) Handles conversation building and response parsing for different model types and interaction modes. **Replay Buffer** (``webgym/data/``) Manages trajectory storage, sampling, and filtering for training. **Rollout Collection** (``webgym/environment/``) Orchestrates parallel browser interactions and trajectory collection. **Policy Configuration** (``webgym/models/``) Defines the WebAgent and model interfaces for action generation. **Utilities** (``webgym/utils/``) Shared utilities including blocklist management, image processing, task sampling, task history tracking, and trajectory storage. .. code-block:: text webgym/utils/ ├── __init__.py ├── blocklist_manager.py # Blocked website management ├── image_utils.py # Image encoding/decoding for vision models ├── rollout_sampler.py # Task selection strategies for rollouts ├── task_history_manager.py # Task attempt history tracking └── trajectory_storage.py # Incremental trajectory file storage **WandB Logger** (``webgym/logging/``) Provides experiment tracking and logging integration. Data Flow --------- .. code-block:: text ┌─────────────────┐ │ Task Sampler │ └────────┬────────┘ │ ▼ ┌─────────────────┐ ┌─────────────────┐ │ AsyncWebGym │◄────►│ WebAgent │ │ (environment) │ │ (policy) │ └────────┬────────┘ └────────┬────────┘ │ │ │ ▼ │ ┌─────────────────┐ │ │ vLLM Server │ │ └─────────────────┘ ▼ ┌─────────────────┐ │ Replay Buffer │ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ LLaMA-Factory │ │ Training │ └─────────────────┘