AgentAssay
Regression Testing for Non-Deterministic AI Agents
AgentAssay is a statistically rigorous testing framework for AI agents. It replaces the industry’s guess-and-pray approach with hypothesis-driven regression testing that tells you exactly how many trials you need, how confident the result is, and how much it will cost.
The Problem
Testing non-deterministic AI agents is brutally expensive. A single comprehensive test run across model versions, prompt variants, and tool configurations can cost $150–200 in API tokens alone. Teams either over-test and blow their budgets, under-test and ship broken agents, or skip testing entirely and hope for the best. None of these are engineering.
Key Capabilities
Token-Efficient Testing
Achieves 5–20x cost reduction compared to brute-force repetition. Uses adaptive sampling to concentrate compute on the scenarios that actually matter.
Three-Valued Verdicts
Every test returns PASS, FAIL, or INCONCLUSIVE with a calibrated confidence interval. No more binary pass/fail on inherently stochastic systems.
Behavioral Fingerprinting
Builds a statistical signature of each agent’s behavior profile. Detects regressions by comparing fingerprints across versions, reducing required trials by 30–50%.
Adaptive Budget Optimization
Computes the exact minimum number of trials needed per scenario to reach your target confidence level. No wasted tokens, no under-powered conclusions.
pytest Plugin and Framework Adapters
Ships as a pytest plugin with six adapters for popular agent frameworks. Drop it into an existing test suite and start testing agents alongside your unit tests.
AgentAssert
Design-by-Contract for AI Agents
Formal specification and runtime enforcement of behavioral contracts for autonomous AI agents. Prevents drift, ensures compliance, enables composition.
SkillFortify
Supply Chain Security for AI Agent Skills
Static analysis, behavioral sandboxing, and cryptographic attestation for the AI agent skill ecosystem. Detects malicious skills before they execute.
SuperLocalMemory
Information-Geometric Memory for AI Agents
Local-first AI agent memory with mathematical foundations. 74.8% on LoCoMo without cloud dependency — highest local-first score reported. Fisher-Rao retrieval, sheaf cohomology, Langevin lifecycle. EU AI Act compliant.
