Documentation

Bunsen is an experiment runner for agentic systems: give an agent an environment, run it reproducibly, capture everything, and evaluate the result. New here? Start with Introduction, then Getting Started.

Start Here

Orientation and the fastest path to a first run and result.

Introduction
Getting Started
Run a Terminal Bench Task
Bring Your Own Task

Concepts

How a run is composed: experiments, agents, and the container.

How Bunsen Works
The Environment Model
Environment Internals
Trust Model

Authoring

Write the config files and recipes that define experiments and agents.

experiment.yaml Reference
agent.yaml Reference
Agent Dependencies Cookbook
System Prompts
Running as Root
Supervised Mode
Agent Skills

Evaluation

Define scoring criteria and choose where and how scorers run.

Scorers & Evaluation
Scoring in the Agent Container
Scoring Service Tasks

Suites

Consume published benchmark suites and author your own.

Suites

Reference

Command, project-config, run-output, platform, and cost references.

CLI Reference
Project Configuration
Run Manifest & Events
Exporting a Run's Workspace
Cost Accounting
Platforms & Architecture
Packages & Schemas
Glossary