Deterministic Firewall for Autonomous Agents

Why LLM-as-judge failed in production

The first generation of agent safety systems added a second model to read the payload and decide whether it looked safe. In practice that approach did not hold up under production load. Judgments were slow, expensive, and easy to confuse with prompt injection. The same failure modes that affect the primary agent affected the judge.

Operators ended up with either noisy alerts that burned trust, or brittle rules that still allowed dangerous actions to slip through. The lesson was simple. The language model should propose actions. A separate system should decide, in a deterministic way, what is allowed to execute.

From scoring prompts to enforcing grammars

LetsPing treats the firewall as a strict parser and policy engine. High risk tools do not accept arbitrary strings. They accept structured payloads that can be parsed into either abstract syntax trees or constrained parameter objects. Policies are expressed against those structures, not against prose.

A database tool can be limited to SELECT statements with bounded WHERE clauses on approved views. A Terraform integration can be prevented from creating world-open security groups or wild card IAM policies. A refund tool can be forced to use positive integer amounts below project thresholds and to target the original charge identifier. The model is free to propose, but only proposals that satisfy these grammars reach downstream systems.

Typed HTTP egress policies

Many real incidents have nothing to do with clever JSON. They are simple HTTP calls to the wrong place. LetsPing intercepts HTTP tools at the point where a request is fully described but not yet executed. It parses the call into method, scheme, host, port, path, query, headers, and a coarse description of the body, and then applies egress policies.

Projects can allowlist domains and schemes and deny link-local, loopback, and private ranges outright. Requests to cloud metadata endpoints are rejected at the resolver regardless of encoding. Whether the agent uses dotted-decimal addresses, integer forms, hexadecimal, or DNS rebinding, the underlying range is treated as off limits. Pattern matching remains available for specific hostnames, but always on top of these typed rules rather than as the first line of defense.

Context poisoning and trusted inputs

Agents rarely act on prompts alone. They read emails, tickets, documents, web pages, and database rows before deciding which tools to call. That creates a large surface area for indirect prompt injection. An attacker does not need to control the agent. They only need to control part of its environment.

LetsPing distinguishes between trusted and untrusted context sources. Internal systems and authenticated operators can be treated as trusted. Arbitrary user content, scraped pages, and inbound email content are treated as untrusted. Context segments can be tagged accordingly, and policies can require that certain tools are either disabled or always routed through human approval when untrusted context is in play.

This moves the problem from trying to clean up every piece of text to constraining what actions are permitted given a known mix of trusted and untrusted inputs.

Markov as a behavioral lens, not a gate

LetsPing still uses Markov chains, but in a narrow role. Every request is reduced to a small feature vector that includes the service, action, coarse action type, priority bucket, and a structural fingerprint of the payload. A per agent baseline records how these features usually evolve over time.

When the baseline is stable, the engine computes an anomaly score for each new transition. High scores indicate that the agent is following a path that is unusual relative to its own history. That signal feeds into dashboards and guardrails that can choose to pause or flag specific requests, but the baseline never replaces explicit policies over grammars and egress rules.

In other words, Markov is used to point humans at surprising behavior, not to silently veto or approve actions on its own.

Mapping to OWASP LLM01 through LLM08

LLM01 Prompt injection and LLM07 insecure plugin design. Grammar-constrained tools and typed HTTP egress policies prevent injected instructions from escalating into direct access to production systems, even when they are embedded in context.
LLM03 and LLM04 context and training data poisoning. Explicit tagging of trusted and untrusted sources, combined with stricter policies when untrusted context is present, reduces the blast radius of poisoned inputs.
LLM06 sensitive information disclosure. Structured inspections of destinations and payload fields, rather than simple regex scans, make it harder for exfiltration attempts to hide behind unusual encodings or multi step flows.
LLM08 excessive agency. Least privilege rules over grammars ensure that even a misaligned or confused agent cannot exceed its intended authority over data, infrastructure, or financial operations.

Combined with cryptographic identity, escrow, and audit trails, this approach gives teams a clear statement they can take to security review. The firewall is deterministic, explainable, and designed to work in front of any framework that can call an HTTP API.