Rollout Script: rollout.py (Optional Read)
The scripts/rollout.py script handles trajectory collection from web environments using a WebAgent.
It is the core data collection component of the RL pipeline.
Overview
This script:
Loads tasks from JSONL files and samples them for rollout
Creates a WebAgent with vLLM inference backend
Runs asynchronous trajectory collection via AsyncWebGym
Saves trajectories incrementally with checkpoint callbacks
Calculates and logs rollout metrics
The script is invoked by run.sh and uses Hydra for configuration.
Entry Point
python scripts/rollout.py [hydra_overrides...]
In practice, run.sh always overrides with --config-name rollout_train or --config-name rollout_test:
python scripts/rollout.py --config-name rollout_train save_path=/data/exp1 data_path=/data/shared
python scripts/rollout.py --config-name rollout_test save_path=/data/exp1 data_path=/data/shared
Key Components
WebAgent Initialization:
agent = WebAgent(
policy_config=config.policy_config,
context_config=config.context_config,
model_config=model_config,
save_path=config.save_path,
vllm_server_url=vllm_server_url,
openai_config=config.openai_config,
...
)
Task Sampling:
uniform: Random uniform samplingratio: Ratio-based samplingTraining tasks from blocked domains are filtered out; test tasks are not filtered
Multi-Node Support:
Resources are divided proportionally based on rank_load_weight. Each node gets a slice of the server pool. Trajectory files are saved with rank suffixes (e.g., iteration5_rank_0.pt).
Example: server_size=64, rank_weights="2,1,1" across 3 nodes
Node 0 (weight 2): 32 instances, 50% of tasks
Node 1 (weight 1): 16 instances, 25% of tasks
Node 2 (weight 1): 16 instances, 25% of tasks
Metrics
calculate_rollout_metrics() computes: success_rate, avg_steps, avg_chars_per_response, goback_rate, crashed_rate, blocked_rate.
Crash Detection — a trajectory is “crashed” if:
Empty or has no steps
First action is
invalid_urlContains dummy actions (None values)
Has fewer than 10 steps AND doesn’t end with an answer AND is not blocked
Output
Trajectories are saved to <save_path>/train_trajectories/ or <save_path>/test_trajectories/:
<save_path>/
├── train_trajectories/
│ ├── train_trajectories.pt.iteration0 # Single-node
│ ├── train_trajectories.pt.iteration1
│ └── train_trajectories.pt.iteration2
└── test_trajectories/
├── test_trajectories.pt.iteration1_rank_0 # Multi-node: per-rank files
├── test_trajectories.pt.iteration1_rank_1
└── test_trajectories.pt.iteration1 # Aggregated by master