Context Management
The context management module (webgym/context/) handles conversation building and response parsing for the web agent.
Module Structure
webgym/context/
├── __init__.py
├── context_manager.py # Main context manager class
├── universal_prompt.py # Prompt templates
└── parsers/ # Action parsers
├── __init__.py
├── base_parser.py # Abstract parser interface
├── coordinates_parser.py # Coordinates mode parser
└── set_of_marks_parser.py # Set-of-marks mode parser
ContextManager
The ContextManager class (context_manager.py) is the central component that:
Manages conversation building for different model types
Handles response parsing
Supports different interaction modes (coordinates vs set-of-marks)
Initialization:
from webgym.context import ContextManager
context_manager = ContextManager(
context_config={'interaction_mode': 'coordinates'}, # default is 'set_of_marks'
model_config={'model_type': 'qwen3-instruct'},
verbose=True
)
Key Methods:
build_conversation(task, trajectory, current_observation, **kwargs)Builds a model-specific conversation from the current state.
parse_response(raw_response)Parses the model’s raw response into structured action data.
get_interaction_mode()Returns the current interaction mode (
'coordinates'or'set_of_marks').
Note
The interaction_mode value must use underscores: set_of_marks (not set-of-marks with hyphens).
Interaction Modes
- Coordinates Mode:
Actions are specified using pixel coordinates (x, y). Example:
click(500, 300)- Set-of-Marks Mode:
Actions reference numbered UI elements. Example:
click([15])(clicks element #15)
Parsers
Located in webgym/context/parsers/, parsers convert raw model responses into executable actions:
Extract action type (click, type, scroll, etc.)
Parse action parameters
Validate action format