Documentation

Bunsen is an experiment runner for agentic systems: give an agent an environment, run it reproducibly, capture everything, and evaluate the result. New here? Start with Introduction, then Getting Started.

Start Here

Orientation and the fastest path to a first run and result.

Concepts

How a run is composed: experiments, agents, and the container.

Authoring

Write the config files and recipes that define experiments and agents.

Evaluation

Define scoring criteria and choose where and how scorers run.

Suites

Consume published benchmark suites and author your own.

Reference

Command, project-config, run-output, platform, and cost references.