Sharajdev

litmus — System Prompt Tester

Free Chrome Extension

Turn “does this prompt work?” from a gut feeling into a measured result — for plain prompts, tool calls and multi-step agents alike. Paste a system prompt, pick the model you actually ship on, and litmus either scores the output with an auto-written LLM-as-judge rubric or deterministically checks that your tools and agents behave. Every run is kept as a comparable version. Local-first, bring-your-own-key, no backend.

litmus capture screen — paste a system prompt and choose what you're testing
Paste a prompt, pick the model you ship on, and choose what you're testing.
litmus eval cases with tool tests and agent scenario panels
Auto-generated test cases — typical, edge and adversarial — plus tool and agent panels.
litmus results showing per-case scores, judge reasoning and run-to-run spread
Per-case scores with the judge's reasoning and run-to-run variance.
litmus version timeline comparing prompt revisions by dimension
Every run kept — compare versions by dimension and export the history.

What litmus does

litmus is a system-prompt testing Chrome extension for people building on LLMs. Instead of eyeballing a couple of completions and shipping on vibes, you run your prompt against the exact model you deploy on and get a structured, repeatable score back. It lives entirely in a Chrome side panel and runs locally with your own API keys — there is no litmus backend, no account and no tracking.

Two ways to test

How it works

What you get

Local-first & private

litmus is local-first and bring-your-own-key. Your keys, prompts and results are stored only in your browser; the only network calls are the direct API requests to the provider you choose, to run the test. Tools in agent runs are mocked — nothing real is executed. There's no analytics, no ads and no account, and a spend cap you set blocks any run that would cost more than you want. Its Chrome permissions are deliberately minimal — no broad host access and no tabs permission.

Who it's for

Prompt engineers and AI app developers who want to verify a prompt, tool or agent before shipping — without standing up a cloud eval platform. It pairs naturally with AI test-case generation from QAtalyst and AI-assisted localization QA from LingoAI. Free on the Chrome Web Store; works in any Chromium browser.

FAQ

Which models does it support? OpenAI, Anthropic and Google models, selectable in settings, using your own API key. You can set the judge model separately from the target you're testing.

How is output quality scored? litmus auto-writes an LLM-as-judge rubric per quality dimension and generates typical, edge and adversarial cases — or you can edit them. Each output is scored, with the judge's reasoning shown.

Can it test tools and agents, not just text? Yes. Tool calls are checked deterministically against your JSON schema, and multi-step agents run against mock tools and are scored on goal completion, tool selection, argument validity, recovery and efficiency — no LLM judge, so results don't drift.

How does it handle non-deterministic models? Run each case several times; litmus reports the spread (mean ± range) so run-to-run variance is visible rather than hidden.

Does my data leave the browser? No — it's local-first with no backend. Keys and prompts stay on your device and are sent only to the provider you select, and agent tools are mocked rather than executed.

Is it free? Yes, free on the Chrome Web Store.

Related extensions