Skip to content

Architecture Overview

This document provides a high-level view of the homelab infrastructure, covering the full provisioning pipeline, component inventory, and naming conventions.

Provisioning Pipeline

The homelab is provisioned through a multi-stage pipeline that takes bare-metal hardware to a fully operational Kubernetes cluster running production-grade workloads.

flowchart LR
    subgraph stage1["Stage 1: Hypervisor"]
        ansible1["Ansible"] --> proxmox["Proxmox VE"]
    end

    subgraph stage2["Stage 2: VM Template"]
        packer["Packer"] --> template["K8s Node Template"]
    end

    subgraph stage3["Stage 3: Virtual Machines"]
        terraform["Terraform"] --> vms["VM Instances"]
    end

    subgraph stage4["Stage 4: Kubernetes"]
        ansible2["Ansible"] --> kubeadm["kubeadm Cluster"]
    end

    subgraph stage5["Stage 5: Workloads"]
        argocd["ArgoCD"] --> apps["Applications"]
    end

    stage1 --> stage2
    stage2 --> stage3
    stage3 --> stage4
    stage4 --> stage5

Stage 1 -- Hypervisor Provisioning: Ansible configures the Proxmox VE hypervisor nodes, managing host-level settings, storage pools, and network bridges.

Stage 2 -- VM Template: Packer builds a K8s-ready Ubuntu 24.04 VM template on Proxmox using the proxmox-iso builder with Ubuntu autoinstall. The template is provisioned with Ansible roles (base, k8s_prereqs, nfs, igpu) so every node cloned from it already has the container runtime, kubeadm, NFS client, and iGPU drivers installed.

Stage 3 -- VM Provisioning: Terraform clones the Packer-built template to provision VMs, assigning per-node compute resources, IP addresses, and PCI device passthrough via cloud-init.

Stage 4 -- Kubernetes Bootstrap: Ansible bootstraps the kubeadm-based Kubernetes cluster, handling NFS mounts, iGPU device verification, control plane initialization, worker node joins, and CNI (Cilium) deployment. Roles already baked into the template are conditionally skipped.

Stage 5 -- Workload Deployment: ArgoCD manages all cluster workloads declaratively via GitOps. An ApplicationSet with a Git File Generator discovers per-app config.yaml files and generates independent Applications for each infrastructure component and application.

Component Inventory

Component Role Namespace
Cilium Container Network Interface (CNI) kube-system
ArgoCD GitOps continuous delivery argocd
Cilium Gateway API Gateway controller + L2 LoadBalancer IP allocation default
cert-manager TLS certificate management (self-signed CA) cert-manager
Vault Centralized secrets backend (KV v2) vault
External Secrets Operator Syncs Vault secrets into K8s Secrets external-secrets
NFS Provisioner Dynamic NFS-backed PVC provisioning nfs-provisioner
Metrics Server Kubernetes resource metrics API kube-system
MinIO S3-compatible object storage for backups backups
Intel GPU Operator Intel GPU device driver management intel-gpu-operator
Intel GPU Plugin Intel iGPU device plugin for workloads intel-gpu-operator
kube-prometheus-stack Prometheus, Grafana, Alertmanager, Node Exporter, kube-state-metrics monitoring
Loki Log aggregation (single-binary mode) monitoring
Velero Cluster and volume backup/restore backups
Alloy DaemonSet log collector monitoring
Authentik SSO provider (forward-auth + OIDC) auth
Reloader Automatic pod restarts on ConfigMap/Secret changes kube-system
Kyverno Kubernetes policy engine (admission control) kyverno
Descheduler Pod rebalancing across nodes (CronJob) kube-system
Jellyfin Media server arr
Sonarr TV series management arr
Radarr Movie management arr
Prowlarr Indexer management arr
Bazarr Subtitle management arr
Seerr Media request management arr
qBittorrent Torrent client (via Gluetun VPN sidecar) arr
Recyclarr Quality profile sync (CronJob) arr
Tdarr Media transcoding arr
Exportarr Prometheus exporter for *arr app metrics arr
Homepage Dashboard arr
Uptime Kuma Synthetic monitoring and status page monitoring
OpenClaw AI agents for cluster ops and media management openclaw

Naming Conventions

Consistent naming across the infrastructure simplifies management, documentation, and troubleshooting.

Pattern Example Description
homelabpve## homelabpve01 Proxmox VE hypervisor nodes
homelabk8s## homelabk8s01 Kubernetes cluster identifiers
cluster-node-# homelabk8s01-node-1 Individual Kubernetes nodes within a cluster

Repository Structure

The repository is organized by tool and cluster:

homelab/
  packer/            # VM template builds
    k8s-node/        # K8s node template (Ubuntu 24.04 + autoinstall)
  ansible/           # Playbooks for Proxmox and K8s provisioning
  terraform/         # VM provisioning on Proxmox
  k8s/
    bootstrap/       # ArgoCD bootstrap and ApplicationSet
    clusters/
      homelabk8s01/  # Cluster-specific ArgoCD Applications
        apps/        # Application workloads
        infrastructure/  # Infrastructure components
  docs/              # MkDocs documentation

Single Source of Truth

The Git repository is the single source of truth for all cluster state. Manual changes made directly to the cluster will be detected and reverted by ArgoCD's automated sync with pruning and self-healing enabled.