Models
======

The models module (``webgym/models/``) provides the core agent and evaluation components for web automation.

Module Structure
----------------

.. code-block:: text

   webgym/models/
   ├── __init__.py
   ├── web_agent.py         # WebAgent class for action generation
   ├── evaluator.py         # Evaluator class for reward computation
   ├── model_factory.py     # Model interface factory
   ├── base/
   │   ├── model_interface.py       # Abstract model interface
   │   ├── conversation_builder.py  # Conversation building
   │   ├── evaluation_prompt.py     # Evaluation prompt templates
   │   └── prompt_processing.py     # Prompt processing utilities
   └── qwen/
       ├── qwen_interface.py        # Qwen3-VL model interface
       └── conversation_builder.py  # Qwen3-VL conversation builder

WebAgent
--------

The ``WebAgent`` class (``web_agent.py``) generates browser actions using vLLM for inference.

.. code-block:: python

   from webgym.models import WebAgent

   agent = WebAgent(
       policy_config=policy_config,
       context_config=context_config,
       model_config={'model_type': 'qwen3-instruct'},
       save_path='/path/to/checkpoints',
       vllm_server_url='http://localhost:8999',
       openai_config=openai_config,  # Creates internal Evaluator
       operation_timeout=120,
       vllm_timeout=120,
       max_retries=1,
       max_vllm_sessions=32,
       verbose=True
   )

**Key Methods:**

``get_action_and_observation_sync(trajectory, screenshot_path, page_metadata, step_data)``
   Returns ``(Action, Response)`` for the next step based on current state and trajectory history.

``parse_action_to_browser_command(action)``
   Converts an Action object to a browser-executable command.

When ``openai_config`` is provided, the evaluator is accessible via ``agent.evaluator``.

Evaluator
---------

The ``Evaluator`` class (``evaluator.py``) handles trajectory evaluation using vision models (OpenAI/Gemini).

**Key Methods:**

``get_verifiable_reward(trajectory)``
   Returns ``(reward, evaluation_texts, is_blocked)``. Uses multi-criteria verification:

   1. ``judge_submission_images()``: Select relevant screenshots
   2. **Criterion B** (anti-hallucination): Check agent's response against screenshots
   3. **Criterion A** (fact verification): Check each rubric/fact against screenshots
   4. **Reference Answer** (Step 4): If reference exists and all Criterion A passed, verify answer match

   Final reward: ``1`` if all checks pass, ``0`` otherwise.

``check_if_blocked(trajectory)``
   Samples up to 20 screenshots to detect CAPTCHA/blocking pages.

``check_single_screenshot_for_blocking(screenshot_path, task_name, step_number)``
   Real-time blocking detection during rollout.

Configuration
-------------

.. code-block:: yaml

   openai_config:
     model: "gemini-3-flash-preview"
     openai_api_key_env_var: "GEMINI_API_KEY"
     base_url: "https://generativelanguage.googleapis.com/v1beta/openai/"

   policy_config:
     base_model: "Qwen/Qwen3-VL-8B-Instruct"
     temperature: 1
     top_p: 0.99
     top_k: 2
     max_new_tokens: 3072