Projects
A suite of benchmarks for building autonomous web agents.
WebArena
A realistic web environment for building autonomous agents.
NeurIPS 2024 · Oral
WebArena-Infinity
Continuous and scalable web agent evaluation in evolving environments.
VisualWebArena
Evaluating multimodal agents on realistic visual web tasks.
ACL 2024
TheAgentCompany
Benchmarking LLM agents on consequential real-world tasks in a simulated company.
ICML 2025