NEWGitHub triggers + Slack/Discord notifications shipped

One QA board for
humans and AI agents.

A universal execution board where testers and autonomous agents collaborate on the same interface. AI progressively takes over the cognitive work — writing test cases, generating scripts, executing runs, reporting back — without ever leaving the board.

Start free· no cardLocal mode · zero infra · self-host on a $5/mo VPS

MIT licensed · OSSSelf-host on a $5/mo VPSOpenAPI & Postman import

qa-runner.app / user-auth / login / TC-004

staging

User Auth

TC-001Logs in valid creds…

TC-002Rejects bad password…

TC-003Expired token 401…

TC-004Refresh token rotates…

TC-005Logout revokes…

Signup

Workspaces

SpecWorkflowScript

Read TC contextget_tc_context

Resolve dependencieslist_test_cases

Generate Part 2gen_script · 412 tok

Apply fixturescreate_test_user, seed_token

Execute pytestexecuting… 184ms

6Capture post-statequeued

7Report resultqueued

agent.log

0.31 ✓ TC spec loaded · 4 APIs

0.71 ✓ deps ok · TC-018, TC-021

0.74 → gen_script(part=2)

2.51 ✓ 24 lines · sonnet-4.5

2.78 ✓ create_test_user · u_4421

3.01 → execute_script()

3.18 collected 1 item

3.42 POST /api/auth/login 200 184ms

3.45 asserting response shape…

AI Assistantsonnet-4.5

Assert refresh token rotates + access token is valid for /me.

used read api_ref

Adding refresh_token not-equal check + GET /me with the new access…

assert body["refresh_token"] != prev_refresh

Works with the tools your team already runs

ClaudeGitHubMCPSlackDiscordOpenAPIPostmanpytest

The QA gap

QA tooling was built for one user. Now there are two.

Postman is for humans clicking buttons. Pytest is for engineers reading stack traces. Neither was designed for an AI agent that wants to read context, generate a script, run it, and write back the result. QA Runner is.

Today, without QA Runner

Tests live in 5 places. Postman collections, pytest files, Jira tickets, Notion pages, screenshots in Slack.

AI agents can't run them. No machine-readable definition. No tool interface. No way to track results.

Failures get lost. A nightly run fails at 3 AM. You find out at standup.

Onboarding takes weeks. New QAs spend their first month learning where the tests are.

With QA Runner

One board. Feature → Suite → Test Case. Same source of truth for humans and agents.

Agents have an MCP server. Claude can read TCs, run them, and write back results — no glue code.

Slack alerts. A run fails — your channel knows in 2 seconds with the failing assertion inline.

Day-1 productive. Open the board, see every test, run it. No setup, no IDE, no YAML.

TinyDependency surface — auditable in a single afternoon.

5-minFrom clone to first run on a fresh machine.

$5/moSelf-host on a VPS that can run docker compose.

100%Audit-logged — every run, every script snapshot.

The loop

From ticket to passing test in four steps.

No ceremony. Drop in a description, let AI do the busywork, override anything you want, and ship the result to the same board your teammates see.

Describe the case

Paste the ticket or a one-liner. Line 1 is the display name; everything below is context the AI reads.

AI drafts Part 2

Claude generates a pytest script using your API Library, fixtures, and dependency outputs. You review.

Run anywhere

Locally via docker, in CI via GitHub Actions, or right from the board. Same script, same result schema.

Reported back

Every run lands on the board with run_by attribution and a snapshot of the exact script that executed.

Built for two audiences

Designed for testers. Trusted by agents.

◇

For testers

Feature → Suite → TC treeThe information hierarchy your team already thinks in.
Part 1 / Part 2 splitSetup is yours and stays put. Test logic regenerates from AI on demand.
Fixture library, your wayBring your own setup functions — they get injected into every run.
Multi-env with secret isolationTokens never leave your device — AI sees URLs and header keys only.
Postman + OpenAPI importBring an existing collection and you're 80% set up.

✦

For AI agents

MCP server, publicAny agent — Claude Code, Cursor, custom — connects via SSE over HTTP.
report_result is mandatoryAgents can only persist results through MCP. No back-channel writes.
Tool registry, not promptsget_tc_context, gen_script, execute_script — typed, audited, approval-gated.
Script snapshot per runEvery result carries the exact bytes the agent executed. Traceability is structural.
Approval flow, your callAuto-run, review-each, or human-in-loop on writes. Configurable per env.

Built for the workflow

Everything a QA team and an AI agent both need.

Run tests by hand. Run them on every PR. Run them at 3 AM. Run them from a chat with Claude. Same TC, same result, one history.

AI assistant, in context

Claude sees your TC, your fixtures, your environment, and your last 10 runs. Ask it to write Part 2, refine an assertion, or explain why TC-014 has been flaky for three days. It has tools, not vibes.

claude-sonnet-4.5get_tc_contextexecute_scriptreport_result+8 tools

GitHub triggers

Run on push, PR, or schedule. Block merges on fail.

Slack & Discord

Failures land in your channel with the failing TCs already inline.

Two-part scripts: human owns setup, AI owns logic.

Part 1 is yours — fixtures, dependencies, secrets. Part 2 is the AI's — the actual test logic, regenerable from a spec change. The boundary is enforced. AI never touches your auth.

Part 1 · human Part 2 · AIresettablediff-able

MCP-native

Your QA board, exposed to every agent on the planet.

A typed tool registry over SSE. Connect Claude Code to your qa-runner instance and it writes results to the same board your team is staring at — with attribution, script snapshot, and a verifiable trail.

SSE over HTTP · streams agent steps in realtime

Tools are typed and audited · approvals are first-class

Works with Claude Code, Cursor, GitHub Actions, custom pipelines

Read the MCP spec See tool registry

tools.py

python

@mcp_tool
def report_test_result(
    tc_id: str,
    verdict: Literal["pass", "fail", "error"],
    http_status: int,
    duration_ms: int,
    actual_response: dict,
    note: str,
    env: str,
    run_by: str,           # "human" | "agent:{id}"
    script_snapshot: str,  # exact script executed
) -> RunResult: ...

@mcp_tool
def get_tc_context(tc_id: str) -> TCSpec: ...

@mcp_tool
def gen_script(tc_id: str, part: int = 2) -> str: ...

@mcp_tool
def execute_script(
    tc_id: str, timeout: int = 30
) -> ExecutionResult: ...

connect · http://localhost:8000/mcp

Deploy

Cloud or air-gapped — same product, same data model.

Managed cloud

Self-host on Vercel + Railway

Deploy frontend to Vercel, backend to Railway/Fly/Render. Bring your own Anthropic key. Per-target recipes shipped in the repo.

FrontendVercel

BackendRailway · FastAPI + ARQ

DatabaseSupabase · PostgreSQL

AuthEmail + GitHub OAuth

Deploy recipes

Local · self-hosted

Air-gapped friendly

One command. Docker-compose-based. Agent has full local file/process access — perfect for CI runners and privacy-strict shops.

# docker-compose.yml
services:
  frontend: image: qa-runner-frontend
  backend:
    image: qa-runner-backend
    volumes:
      - ./data:/app/data
      - ./scripts:/app/scripts

$ docker compose up

Self-host guide

FAQ

The questions every team asks.

Do I need to know Python?

No. The AI writes Part 2 from a one-sentence description. You can run, edit, or refine without ever opening the script tab. But if you do know Python, you have full pytest underneath — every TC compiles to plain pytest, no proprietary DSL.

Where does the AI run? Is my code sent to Anthropic?

How does this compare to Postman + Newman?

Can I self-host?

Does it block merges on GitHub?

Stop pasting Postman screenshots into Slack.

Spin up a board in two minutes. Connect your first agent in five. Self-host is free, forever — no per-seat pricing, no usage caps.

Start free Self-host docs