RL Pipeline Env
===============

This guide covers setting up the Python environment for the WebGym RL training pipeline.

Prerequisites
-------------

* Python 3.10+
* CUDA-compatible GPU (for training and vLLM inference)
* Conda or virtualenv (recommended)
* rsync (for fast parallel checkpoint copying)

Create Environment
------------------

.. code-block:: bash

   # Create a new conda environment
   conda create -n webgym python=3.10
   conda activate webgym

Install Dependencies
--------------------

1. **Install WebGym with training dependencies:**

   .. code-block:: bash

      pip install -e ".[train]"

   This installs all packages needed for the RL training pipeline, including
   PyTorch, transformers, vLLM, WandB, and other ML dependencies.

   .. tip::

      If you also need the rollout server (OmniBoxes) on the same machine,
      install everything at once:

      .. code-block:: bash

         pip install -e ".[all]"

2. **Clone and install LLaMA-Factory with DeepSpeed:**

   LLaMA-Factory requires a specific DeepSpeed version for compatibility.
   Clone the repository and install with all required extras:

   .. code-block:: bash

      # Clone LLaMA-Factory and pin to a known compatible commit
      git clone https://github.com/hiyouga/LLaMA-Factory.git
      cd LLaMA-Factory && git checkout 8c74dca76a813129c175489c85bf50e2c614091f && cd ..

      # Install with extras
      pip install -e "LLaMA-Factory/[metrics,deepspeed,transformers]" --no-build-isolation

   This installs:

   * ``metrics`` - Training metrics and evaluation
   * ``deepspeed`` - Distributed training with DeepSpeed (version constrained for compatibility)
   * ``transformers`` - HuggingFace transformers integration

Verify Installation
-------------------

.. code-block:: bash

   # Check key packages
   python -c "import webgym; print('WebGym OK')"
   python -c "import llamafactory; print('LLaMA-Factory OK')"
   python -c "import deepspeed; print(f'DeepSpeed {deepspeed.__version__}')"
   python -c "import vllm; print('vLLM OK')"

Environment Variables
---------------------

Set the following environment variables before running:

.. code-block:: bash

   # HuggingFace token (required for model downloads)
   export HF_TOKEN="your-huggingface-token"

   # WandB API key (required for logging)
   export WANDB_API_KEY="your-wandb-api-key"

   # CPU cluster token (required for browser instances)
   export CPU_CLUSTER_TOKEN="your-cluster-token"

   # Optional: Gemini API key for evaluation
   export GEMINI_API_KEY="your-gemini-api-key"

Troubleshooting
---------------

**DeepSpeed version conflict:**

If you see an error like ``deepspeed>=0.10.0,<=0.16.9 is required``, ensure you
installed LLaMA-Factory from the cloned repository as shown above, not from PyPI.

**Qwen3-VL processor not found:**

If you see ``ValueError: Processor was not found`` when training with Qwen3-VL models,
ensure you have transformers >= 4.57.1 installed:

.. code-block:: bash

   pip install "transformers==4.57.1"

This version includes ``Qwen3VLProcessor`` which is required for vision-language training.

**vLLM server errors (HTTP 500):**

If you see ``vLLM server returned status 500`` errors during rollout, the vLLM server
is overloaded. Reduce ``env_config.server_size`` (e.g., from 112 to 64-96).
See :ref:`vllm-server-errors` for detailed guidance.