testing· Part of Qualixar

AgentAssay

Regression Testing for Non-Deterministic AI Agents

AgentAssay is a statistically rigorous testing framework for AI agents. It replaces the industry’s guess-and-pray approach with hypothesis-driven regression testing that tells you exactly how many trials you need, how confident the result is, and how much it will cost.

The Problem

Testing non-deterministic AI agents is brutally expensive. A single comprehensive test run across model versions, prompt variants, and tool configurations can cost $150–200 in API tokens alone. Teams either over-test and blow their budgets, under-test and ship broken agents, or skip testing entirely and hope for the best. None of these are engineering.

$150–200

average cost per AI agent test run

How It Works

Key Capabilities

Token-Efficient Testing

Achieves 5–20x cost reduction compared to brute-force repetition. Uses adaptive sampling to concentrate compute on the scenarios that actually matter.

Three-Valued Verdicts

Every test returns PASS, FAIL, or INCONCLUSIVE with a calibrated confidence interval. No more binary pass/fail on inherently stochastic systems.

Behavioral Fingerprinting

Builds a statistical signature of each agent’s behavior profile. Detects regressions by comparing fingerprints across versions, reducing required trials by 30–50%.

Adaptive Budget Optimization

Computes the exact minimum number of trials needed per scenario to reach your target confidence level. No wasted tokens, no under-powered conclusions.

pytest Plugin and Framework Adapters

Ships as a pytest plugin with six adapters for popular agent frameworks. Drop it into an existing test suite and start testing agents alongside your unit tests.

Evidence

15,500+

Lines of Code

432

Tests Passing

Page Paper

Formal Theorems

Framework Adapters

~83%

Cost Reduction

Read the Paper →View on Zenodo →

Get Started

Read the Paper GitHub Ecosystem Page Zenodo Dataset

From the Qualixar Suite

AgentAssert

Design-by-Contract for AI Agents

Formal specification and runtime enforcement of behavioral contracts for autonomous AI agents. Prevents drift, ensures compliance, enables composition.

SkillFortify

Supply Chain Security for AI Agent Skills

Static analysis, behavioral sandboxing, and cryptographic attestation for the AI agent skill ecosystem. Detects malicious skills before they execute.

SuperLocalMemory

Information-Geometric Memory for AI Agents

Local-first AI agent memory with mathematical foundations. 74.8% on LoCoMo without cloud dependency — highest local-first score reported. Fisher-Rao retrieval, sheaf cohomology, Langevin lifecycle. EU AI Act compliant.