Replay Buffer ============= The replay buffer module (``webgym/data/``) manages training data storage and sampling for RL training. Module Structure ---------------- .. code-block:: text webgym/data/ ├── __init__.py ├── replay_buffer.py # Main replay buffer class ├── components.py # Data structures (Task, Action, Reward, etc.) └── response_decomposer.py # Response parsing utilities ReplayBuffer ------------ The ``ReplayBuffer`` class (``replay_buffer.py``) extends PyTorch's ``Dataset`` and provides: - Trajectory storage and management - Filtering for successful/unsuccessful trajectories - Same-screenshot step filtering - Support for distributed training **Initialization:** .. code-block:: python from webgym.data import ReplayBuffer replay_buffer = ReplayBuffer( trajectories=trajectory_list, agent=web_agent, capacity=None, # None = unlimited filter_successful_only=False, include_reward_in_sample=True, shuffle=False, filter_same_screenshot=True ) **Key Parameters:** ``trajectories`` List of trajectory data to process ``agent`` WebAgent instance for context management ``capacity`` Maximum number of samples to store (optional) ``filter_successful_only`` If True, only samples from successful trajectories are accessible ``include_reward_in_sample`` Include reward information in each sample (default: True) ``shuffle`` Shuffle samples (default: False) ``filter_same_screenshot`` Filter out steps where screenshots haven't changed (default: True) Data Components --------------- ``webgym/data/components.py`` defines core data structures: - ``Task``: Task description and metadata (task_name, domain, subdomain, website, difficulty, evaluator_reference, reference_answer, attempt_level, task_id, max_steps, trajectory_index) - ``Observation``: Screenshot and page state (task, image_path, ac_tree, page_metadata) - ``Action``: Agent action (action, action_string) - ``Response``: Model response (raw_response, answering_tokens, raw_prompt) - ``Reward``: Task completion reward (reward, evaluation, is_blocked, submit, submission_judgment) **Key Methods:** ``get_training_samples(num_samples=None, recency_bias_power=1.0)`` Returns training-eligible samples with recency-weighted sampling. Samples from successful trajectory steps. If ``recency_bias_power`` is 1.0, uses uniform random sampling.