Karate Agent Documentation

Karate Agent is an AI-native browser testing platform built on Karate. It provisions on-demand browser containers from a lightweight grid server and supports two modes of operation.

Two Modes

Mode	Who drives the browser	LLM needed?	Use case
Interactive	Client-side LLM (Claude Code, Cursor, etc.) sends JS commands via REST	No (grid is a proxy)	Exploratory testing, debugging, live demos
Autonomous	Worker-side LLM drives observe-decide-act loop inside karate-agent container	Yes (worker-configured)	CI/CD, scheduled tests, batch jobs

Both modes share the same session lifecycle and API surface.

The Agent API

The core API is designed for LLM consumption — minimal, structured, and token-efficient:

look() — Discover elements on the page. Returns structured JSON with {role, name, locator, actions} per element. Subsequent calls return diffs only.
act(locator, action, value?) — Interact with an element. Display-text locators like {button}Submit use visible text, not CSS/XPath.
wait(condition) — Wait for a condition (navigation, element, text change).
match(actual, expected) — Karate’s structural matching with schema markers.

Key Concepts

Flows — Reusable .js scripts that execute at native speed via Flow.run(). No LLM tokens consumed.
Interactive Mode — Interactive browser exploration via your existing LLM coding agent.
Autonomous Mode — Submit jobs that run flows first, with LLM recovery for failures.

Requirements

Java 21+
Docker (for browser containers)
LLM API key (for autonomous mode only — any OpenAI-compatible, Anthropic, or Ollama endpoint)

Next Steps

Quick Start — Get running in under 60 seconds
Architecture — Understand the grid server, workers, and session lifecycle