The Agent Workbench lets coding agents delegate validation tasks to webmate and get back structured evidence from the running product, instead of raw browser noise.
What it is
It is not a new prompt UI for human testers. It is the agent-facing side of webmate: a structured way for coding agents and agent harnesses to delegate QA tasks and receive evidence-backed feedback from the running product.
Who uses it?
Coding agents and agent harnesses.
What gets tested?
The running web or mobile product.
What does webmate provide?
QA tasks, validation, findings, evidence, and feedback.
What it is not
Not a prompt UI for humans, not a generic agent platform, not a replacement for specs or CI.
The coding agent is the user. The running product is what gets tested.
The gap
Code, prompts, and tests live in symbols. Quality lives in reality. Bridging the two takes a chain of translations, and each one quietly adds assumptions.
Symbolic world
Running reality
The concept
Browser control isn't enough. An agent can click buttons, read the DOM, inspect logs, or drive DevTools. But it still needs to know what to check, what evidence matters, and whether the result can be trusted. The Agent Workbench gives agents a structured way to delegate QA tasks to webmate and get useful feedback from the running product.
Instead of low-level commands
Agents ask QA questions
From commands to QA questions.
The analogy
Humans need an environment to inspect, test, and understand software. Coding agents need something similar, but programmatic, task-based, and evidence-oriented. In webmate, humans use the Workbench interactively. Agents use the Agent Workbench through APIs, skills, MCP, CLI, or other harness integrations.
A person drives the session, inspects, and judges.
Called from a coding agent via API, skill, MCP, or CLI.
workbench.validate({ target: "signup", spec: "password-rules", on: ["iOS Safari", "Pixel 8"] })
The bigger picture
Coding agents don't turn intent into truth in one step. Every transition from intent to spec to code to running product to evidence adds assumptions, and uncertainty grows unless something pushes back. Spec-driven development walks the whole chain. Vibe coding skips the spec and hopes reality cooperates. webmate adds a control at every transition either path takes.
Now
Runtime validation: does the running product behave as intended?
Next
Control uncertainty across the full chain, from product intent to runtime evidence.
What the user or organization wants
The agent's hypothesis about what was meant
The spec is an interpretation of intent, not the intent itself.
A model of how the spec could become software
Code is a proposal for how intent could become behavior.
The product as users experience it
What was observed, proven, or left uncertain
Under Interpretation
Under Implementation
Under Execution
Under Observation
What remains unknown after validation. The Agent Workbench makes the gaps visible.
Without control mechanisms
With Agent Workbench
Stable product semantics
When a checkout breaks, agents need to answer whether the product still meets its specs. A green test isn't enough. webmate anchors that question to a stable Test Object that survives changes in tests, code, and UI.
The Test Object stays stable while tests, implementations, and interfaces evolve.
Use cases
“Check whether this PR broke checkout.”
“Find out why login sometimes fails after redirect.”
“Compare this release candidate with the previous baseline.”
“Show me which user flows aren’t covered by the tests on this change.”
Let's talk
We're validating the Agent Workbench concept with QA and engineering leaders across DACH. If this problem sounds familiar, we'd like to talk. Especially if you think we've framed it wrong.
Exploratory, not sales. We're gathering perspectives from practitioners. The embedded Pipedrive form is blocked until you allow it in the cookie settings.
We respect your inbox. No newsletter, no sales sequence, just a conversation.