Policy Configuration

The policy module (webgym/models/) defines the web agent and model interfaces.

Module Structure

webgym/models/
├── __init__.py
├── web_agent.py          # Main WebAgent class
├── model_factory.py      # Model interface factory
├── base/                  # Base classes and utilities
│   ├── __init__.py
│   ├── model_interface.py
│   ├── conversation_builder.py
│   ├── evaluation_prompt.py
│   └── prompt_processing.py
└── qwen/                  # Qwen-specific implementations
    └── ...

WebAgent

The WebAgent class (web_agent.py) is the main policy interface:

Initialization:

from webgym.models.web_agent import WebAgent

agent = WebAgent(
    policy_config=policy_cfg,
    context_config=context_cfg,
    model_config={'model_type': 'qwen3-instruct'},
    save_path="/path/to/save",
    vllm_server_url="http://localhost:8999",
    openai_config=openai_cfg,
    vllm_timeout=120,
    max_vllm_sessions=32
)

Key Features:

  • vLLM integration for fast inference

  • Context management for conversation building

  • OpenAI API integration for reward evaluation

  • Concurrent request handling with semaphores

Key Parameters:

vllm_server_url

URL of the vLLM server for model inference

vllm_timeout

Timeout for vLLM requests in seconds

max_vllm_sessions

Maximum concurrent vLLM requests

openai_config

Configuration for OpenAI-based reward evaluation

Model Factory

The model_factory.py creates model-specific interfaces:

from webgym.models.model_factory import create_model_interface

interface = create_model_interface({
    'model_type': 'qwen3-instruct'  # or 'qwen3-think'
})

Supported Models:

  • qwen3-instruct: Qwen/Qwen3-VL-8B-Instruct (standard)

  • qwen3-think: Qwen/Qwen3-VL-8B-Thinking (with reasoning)

Base Classes

ModelInterface (base/model_interface.py)

Abstract interface for model-specific operations

ConversationBuilder (base/conversation_builder.py)

Builds conversations in model-specific formats

prompt_processing.py

Utilities for preparing prompts for vLLM format (batch_get_vllm_prompts) and HuggingFace format (batch_get_hf_prompts)