Karate Agent Documentation

Karate Agent is an AI-native browser testing platform built on Karate. It provisions on-demand browser containers from a lightweight grid server and supports two modes of operation.

Two Modes

Mode Who drives the browser LLM needed? Use case
Interactive Client-side LLM (Claude Code, Cursor, etc.) sends JS commands via REST No (grid is a proxy) Exploratory testing, debugging, live demos
Autonomous Worker-side LLM drives observe-decide-act loop inside karate-agent container Yes (worker-configured) CI/CD, scheduled tests, batch jobs

Both modes share the same session lifecycle and API surface.

The Agent API

The core API is designed for LLM consumption — minimal, structured, and token-efficient:

  • look() — Discover elements on the page. Returns structured JSON with {role, name, locator, actions} per element. Subsequent calls return diffs only.
  • act(locator, action, value?) — Interact with an element. Display-text locators like {button}Submit use visible text, not CSS/XPath.
  • wait(condition) — Wait for a condition (navigation, element, text change).
  • match(actual, expected) — Karate’s structural matching with schema markers.

Key Concepts

  • Flows — Reusable .js scripts that execute at native speed via Flow.run(). No LLM tokens consumed.
  • Interactive Mode — Interactive browser exploration via your existing LLM coding agent.
  • Autonomous Mode — Submit jobs that run flows first, with LLM recovery for failures.

Requirements

  • Java 21+
  • Docker (for browser containers)
  • LLM API key (for autonomous mode only — any OpenAI-compatible, Anthropic, or Ollama endpoint)

Next Steps

  • Quick Start — Get running in under 60 seconds
  • Architecture — Understand the grid server, workers, and session lifecycle