Sentinel — PureTensor

The gap in production ops

Observability told you something broke. It didn't tell you what to do.

Modern infrastructure generates more signal than humans can triage. Auto-remediation tools react to known patterns. Neither remembers. The cost compounds every quarter.

Alert fatigue

A typical production cluster emits thousands of distinct signals daily. Most are noise. The few that matter get lost. On-call rotations burn out faster than the architecture they protect.

~94% of fired alerts in median clusters are ignored or auto-acked

MTTR economics

Engineers are paged at 03:00 to apply remediation the system already knew about the last three times it happened. Time-to-diagnosis is the largest variable cost in incident response.

62 min median MTTR for repeat-class incidents at mid-stage companies

Knowledge loss

Every fix lives in Slack scrollback, a closed Linear ticket, an engineer's head. When the same failure recurs in six months, the team rediscovers it from scratch. Tribal memory does not survive turnover.

~40% of incident-response context is irretrievable after 12 months

System architecture

Three tiers. One closed loop. Persistent memory.

Sentinel separates the work by tempo: continuous low-cost triage scans every signal, escalates anomalies to a reasoning core only when the pattern is novel, and commits each successful remediation as an antibody — a permanent immunity recalled by the triage tier next time.

Tier 1 · Triage

Always-on. Sub-second scan of every signal in the production telemetry stream. Recalls antibodies, applies known remediations. Escalates only when the pattern is genuinely novel.

Tier 2 · Reasoning

Invoked only on novelty. Multi-source diagnosis across logs, metrics, configs, code. Proposes a remediation plan, verifies it in a sandbox, applies it — and commits the resulting antibody.

Tier 3 · Antibody DB

The persistent corpus. Every successful remediation becomes an immunity recalled by Tier 1 on the next occurrence. The system gets faster and cheaper every quarter.

Universal by design

One system. Any infrastructure.

Sentinel is operating-system, orchestrator, and topology agnostic. Modern Kubernetes, bare metal, hybrid cloud, on-prem VM estates, embedded edge, post-acquisition spaghetti — the architecture treats them all as inputs.

Point Sentinel at a complex legacy environment nobody fully understands anymore, and within hours it is diagnosing, remediating, and learning the local terrain.

The system ships with a substantial operational corpus distilled from years of production infrastructure work — generic antibodies for the failures that recur across most environments. Once deployed, it begins specialising: capturing your specific config patterns, your unique failure modes, the tribal knowledge that lives in your senior engineers' heads. The generic baseline becomes a specialised local immunity, custom to your stack.

Kubernetes OpenShift Nomad bare metal VMware Proxmox RHEL / Rocky / Ubuntu / Debian SLES Windows Server FreeBSD AIX / Solaris (read-only mode) edge / IoT AWS · Azure · GCP · OCI on-prem · colo · hybrid …and whatever else you actually run

Ships with a corpus

Arrives pre-loaded with a substantial operational antibody database covering the failure classes that recur across most production environments. Does not start from zero, does not require months of supervised learning before producing value.

Specialises on contact

Within the first weeks of deployment, Sentinel learns your specific stack: your config patterns, your failure modes, your tribal knowledge. The generic baseline becomes a specialised local corpus, custom to your environment.

Works on what you have

No need to migrate, modernise, or rewrite anything before deployment. Legacy systems, undocumented services, post-acquisition estates that nobody owns anymore — Sentinel learns them as they are. Bring your mess; it adapts.

The differentiator

Every fix becomes a permanent immunity.

Observability vendors show you what broke. Auto-remediation vendors react to a fixed playbook. Sentinel's antibody database is the moat — a growing corpus of fingerprinted failures, verified remediations, and recall triggers that compounds with every incident.

AB-0142 STORAGE

Stale RBD lock blocking pod startup

Triggered when a pod fails to mount a Ceph RBD volume because the previous mounter died without releasing the lock. Sentinel verifies no live mounter, releases the lock, retries the mount.

trigger: FailedMount + RBD watcher absent
action: rbd lock rm → kubelet retry
invoked: 47×
last: 2026-05-22 14:08 UTC

AB-0089 KUBE

kube-proxy wedge on storage tier nodes

Detected via service connectivity probe failures from a specific node subset. Sentinel restarts the k3s-agent unit, validates iptables rule propagation, re-runs the probe.

trigger: svc connectivity probe fail × 3
action: systemctl restart k3s-agent
invoked: 12×
last: 2026-05-19 03:47 UTC

AB-0203 SYSTEMD

Systemd unit corruption from line-continuation parse

Common after in-place package upgrades that rewrite ExecStart with literal backslash-space sequences. Sentinel diffs against the healthy peer node, scp's the clean unit file, daemon-reloads.

trigger: service start failure + grep "\ "
action: scp from healthy peer + reload
invoked: 8×
last: 2026-05-21 22:14 UTC

AB-0117 OBSERV

Monitoring targets pointing at decommissioned hosts

Triggered by ServiceMonitor scrape failures with NXDOMAIN errors. Sentinel cross-references active fleet inventory, prunes stale targets, regenerates the rule manifest.

trigger: scrape NXDOMAIN × N
action: prune ServiceMonitor endpoints
invoked: 5×
last: 2026-05-18 11:32 UTC

AB-0188 STORAGE

CSI plugin stale node registration

After a node rotation, the storage CSI driver retains a stale node ID. Volume attachments hang indefinitely. Sentinel deletes the driver pod to force re-registration.

trigger: VolumeAttachment stuck > 90s
action: kubectl delete pod ceph-csi
invoked: 3×
last: 2026-05-20 06:55 UTC

AB-0226 KUBE

Image arch mismatch on heterogeneous nodes

amd64-only image scheduled on the cluster's arm64 node. CrashLoopBackoff with "exec format error". Sentinel patches the deployment with a nodeAffinity exclusion and notifies on missing multi-arch manifest.

trigger: "exec format error" in pod log
action: patch nodeAffinity exclusion
invoked: 2×
last: 2026-05-15 17:21 UTC

Built on

Frontier models for reasoning. Sovereign compute for control.

Sentinel runs against any production stack — Kubernetes, bare-metal, hybrid. The reasoning core is hyperscaler-portable. The triage tier runs on commodity inference, including local vLLM for air-gapped deployments.

Claude Haiku 4.5

triage · agentic

Claude Opus 4.7

reasoning · novelty

Amazon Bedrock

model serving

Bedrock AgentCore

agent platform

K3s / Kubernetes

orchestration

Ceph

distributed storage

vLLM (optional)

local · air-gap

pgvector + JetStream

antibody corpus

Sovereign-capable

Sentinel can run entirely on customer-owned compute. The reasoning core can be swapped to a local model (vLLM, llama.cpp) for regulated, air-gapped, or data-residency-constrained environments. No mandatory egress to a hyperscaler.

Production status

Running today. Not pre-product.

Sentinel V2 entered production on PureTensor's Trinity cluster on 2026-05-18. The numbers below are real, from the live system. We are not pre-product. We are pre-customer.

Nodes monitored

Antibodies accumulated

Auto-resolved incidents

Triage tier availability

LIVE · trinity cluster Continuous operation since 2026-05-18. Tier 1 scans 4.2M signals/day. Median escalation rate to Tier 2: 0.018%. Median antibody recall hit-rate: 91.3% on familiar patterns.

What's next

Roadmap.

Internal validation first. Design partners next. Managed offering once the antibody corpus is portable across customer infrastructures.

Now · in production

V2 internal — PureTensor fleet

Sentinel runs autonomically across the Trinity cluster (Kubernetes, Ceph, bare-metal compute, monitoring tier). The antibody corpus accrues from real operational incidents. Every node onboarded reduces median engineer pages by 60–80%.

Q3 2026

Design-partner pilot — 3 customers

Selective onboarding of three operational design partners running Kubernetes at meaningful scale. Co-engineered antibody portability, shared corpus modes, customer-specific safety policies. Tight feedback loop on the human-in-the-loop boundary.

2027

Managed Sentinel

Generally available as a managed control plane. Bring-your-own-cloud or fully managed. Antibody corpus federation with cryptographic provenance. Per-incident pricing model.

Infrastructure that diagnoses, heals, and remembers.