Status: Not started
Goal: Build the capabilities that take the homelab from "well-run cluster" to a platform engineering practice.
Addresses: K12, K13 (no chaos testing, no supply chain verification)
6.1 Staging Cluster
|
|
| Why |
Every infrastructure change is currently tested directly in production. A staging cluster allows validating changes safely and practicing DR procedures without risk. |
| Prerequisites |
Second compute host (Phase 4.1). |
6.2 Runtime Security with Falco
|
|
| Why |
Current security controls operate at admission time (Kyverno) and network time (Cilium). Nothing monitors what happens inside a running container. Falco detects: shell spawned in container, sensitive file read, unexpected network connection, privilege escalation. |
6.3 Chaos Engineering
|
|
| Why |
DR runbooks exist but are never automatically validated. Chaos experiments prove the cluster recovers as documented and expose gaps before real incidents find them. |
| Cadence |
Start with pod-kill experiments. Escalate to node-drain and network-partition tests. |
6.4 Supply Chain Security
|
|
| Why |
Renovate pins digests (preventing tag mutation), but no verification that images were built by trusted parties. cosign ensures images are signed by their maintainers. Harbor adds scanning and caching. |
Definition of Done