Projects
A suite of benchmarks for building autonomous web agents.
WebArena
A realistic web environment for building autonomous agents.
WebArena-Infinity
Continuous and scalable web agent evaluation in evolving environments.
VisualWebArena
Evaluating multimodal agents on realistic visual web tasks.
TheAgentCompany
Benchmarking LLM agents on consequential real-world tasks in a simulated company.
WebArena-x