this repo has no description

Switch away from hetzner's buggy csi to juicefs

+305 -15
+4
CHANGELOG.md
··· 11 11 - Add tarpit for vulnerability scanners hitting known exploit paths (#18) 12 12 13 13 ### Added 14 + - Migrate to 3-node HA cluster with JuiceFS and S3-backed storage (#38) 15 + - Add JuiceFS Redis and CSI manifests for S3-backed storage (#47) 14 16 - Add backup restoration guide for PDS and knot (#35) 15 17 - Create vesper and nyx accounts on PDS (#31) 16 18 - Add daily S3 backup cronjob for Tangled knot data (#9) 17 19 - Add Tangled knot with Spindle CI/CD to k3s cluster (#1) 18 20 19 21 ### Fixed 22 + - Remove deleted pds-test subdomain from TLS certificate (#48) 20 23 - Restore PDS and knot data from S3 backups (#34) 21 24 - Fix backup script to prevent empty source from wiping S3 data (#33) 22 25 - Add PDS handle resolution for vesper and nyx subdomains (#32) ··· 24 27 - Update PDS to v0.4.208 for OAuth metadata support (#13) 25 28 26 29 ### Changed 30 + - Remove IP allowlist restriction from kube API and SSH firewall (#49) 27 31 - Add health check that detects SQLite locking failures (#16) 28 32 - Move node SSH to port 2222 and expose knot Git SSH on port 22 (#14) 29 33 - Update knot hostname from git.sans-self.org to knot.sans-self.org (#12)
+16
Makefile
··· 1 + # Derived secrets — generated from source secrets before kustomize build 2 + JUICEFS_METAURL := k8s/juicefs/metaurl.secret 3 + 4 + .PHONY: secrets clean-secrets build 5 + 6 + secrets: $(JUICEFS_METAURL) 7 + 8 + $(JUICEFS_METAURL): k8s/juicefs/redis-password.secret 9 + @pw=$$(cat $< | tr -d '\n') && \ 10 + printf 'redis://:%s@redis.juicefs.svc.cluster.local:6379/0' "$$pw" > $@ 11 + 12 + build: secrets 13 + kustomize build k8s/ 14 + 15 + clean-secrets: 16 + rm -f $(JUICEFS_METAURL)
+56
README.md
··· 1 + # infrastructure 2 + 3 + Single-tenant AT Protocol infrastructure on Hetzner Cloud. Runs a [PDS](https://github.com/bluesky-social/pds) (sans-self.org) and a [Tangled knot](https://tangled.sh) (knot.sans-self.org) on a k3s cluster. 4 + 5 + ## Stack 6 + 7 + - **Infra**: OpenTofu (kube-hetzner module v2.18.5) + Hetzner Cloud 8 + - **Cluster**: k3s with 3x CAX11 ARM nodes (HA embedded etcd) 9 + - **Storage**: JuiceFS CSI backed by Hetzner Object Storage (S3), Redis metadata 10 + - **Secrets**: git-crypt 11 + - **Manifests**: Kustomize 12 + 13 + ## Secrets 14 + 15 + All files matching `k8s/**/*.secret` are git-crypt encrypted. Unlock before building: 16 + 17 + ```sh 18 + git-crypt unlock 19 + ``` 20 + 21 + `k8s/juicefs/metaurl.secret` is derived from `k8s/juicefs/redis-password.secret` — it embeds the Redis password in a connection URI. If you rotate the password, regenerate it: 22 + 23 + ```sh 24 + make secrets 25 + ``` 26 + 27 + The password file must **not** have a trailing newline. The Makefile handles this correctly. 28 + 29 + ## Deploy 30 + 31 + Manifests: 32 + 33 + ```sh 34 + kubectl apply -k k8s/ 35 + ``` 36 + 37 + JuiceFS CSI driver is managed by OpenTofu as a `helm_release` resource: 38 + 39 + ```sh 40 + tofu apply 41 + ``` 42 + 43 + ## Cluster rebuild 44 + 45 + If the cluster needs to be rebuilt from scratch (new nodes, not just config changes): 46 + 47 + 1. The first control plane node must bootstrap with `cluster-init: true` in `/etc/rancher/k3s/config.yaml` — kube-hetzner doesn't handle this automatically when all nodes are new. 48 + 2. Hetzner can't shrink disks. Switching to a smaller server type (e.g. cx33 → cax11) requires tainting the server resource: `tofu taint 'module.kube-hetzner.module.control_planes["0-0-control-plane"].hcloud_server.server'` 49 + 3. After bootstrap, fetch a fresh kubeconfig from the node — the one in tofu state will have the wrong CA. 50 + 4. JuiceFS CSI on SELinux (MicroOS) requires `sidecarPrivileged: true` in `juicefs-csi-values.yaml` under `node:`. Without it, the CSI socket has a label mismatch and sidecars can't connect. 51 + 52 + ## Backups 53 + 54 + Daily S3 backups via CronJobs (02:00 PDS, 02:30 knot). See [RESTORE.md](RESTORE.md) for recovery procedures. 55 + 56 + After a PDS restore, the sequencer autoincrement must be bumped past the relay's cursor — see RESTORE.md section "Fix sequencer cursor".
+1 -1
dns.tf
··· 1 1 locals { 2 - cluster_ip = module.kube-hetzner.control_planes_public_ipv4[0] 2 + cluster_ip = module.kube-hetzner.ingress_public_ipv4 3 3 } 4 4 5 5 resource "hcloud_zone" "sans_self" {
homelab_kubeconfig.yaml

This is a binary file and will not be displayed.

+51
juicefs-csi-values.yaml
··· 1 + image: 2 + repository: juicedata/juicefs-csi-driver 3 + tag: "v0.31.2" 4 + 5 + mountMode: mountpod 6 + 7 + globalConfig: 8 + enabled: true 9 + manageByHelm: true 10 + mountPodPatch: 11 + - resources: 12 + limits: 13 + cpu: 1000m 14 + memory: 1Gi 15 + requests: 16 + cpu: 100m 17 + memory: 256Mi 18 + 19 + controller: 20 + enabled: true 21 + replicas: 1 22 + provisioner: true 23 + resources: 24 + limits: 25 + cpu: 500m 26 + memory: 512Mi 27 + requests: 28 + cpu: 50m 29 + memory: 128Mi 30 + 31 + node: 32 + enabled: true 33 + sidecarPrivileged: true 34 + resources: 35 + limits: 36 + cpu: 500m 37 + memory: 512Mi 38 + requests: 39 + cpu: 50m 40 + memory: 128Mi 41 + 42 + dashboard: 43 + enabled: false 44 + 45 + webhook: 46 + certManager: 47 + enabled: false 48 + 49 + storageClasses: 50 + - name: juicefs-sc 51 + enabled: false
+19
k8s/juicefs/kustomization.yaml
··· 1 + apiVersion: kustomize.config.k8s.io/v1beta1 2 + kind: Kustomization 3 + 4 + resources: 5 + - namespace.yaml 6 + - redis-pvc.yaml 7 + - redis-deployment.yaml 8 + - redis-service.yaml 9 + - storageclass.yaml 10 + 11 + generatorOptions: 12 + disableNameSuffixHash: true 13 + 14 + secretGenerator: 15 + - name: redis-credentials 16 + namespace: juicefs 17 + type: Opaque 18 + files: 19 + - password=redis-password.secret
k8s/juicefs/metaurl.secret

This is a binary file and will not be displayed.

+4
k8s/juicefs/namespace.yaml
··· 1 + apiVersion: v1 2 + kind: Namespace 3 + metadata: 4 + name: juicefs
+64
k8s/juicefs/redis-deployment.yaml
··· 1 + apiVersion: apps/v1 2 + kind: Deployment 3 + metadata: 4 + name: redis 5 + namespace: juicefs 6 + spec: 7 + replicas: 1 8 + strategy: 9 + type: Recreate 10 + selector: 11 + matchLabels: 12 + app: juicefs-redis 13 + template: 14 + metadata: 15 + labels: 16 + app: juicefs-redis 17 + spec: 18 + containers: 19 + - name: redis 20 + image: redis:7-alpine 21 + command: ["redis-server"] 22 + args: 23 + - --requirepass 24 + - $(REDIS_PASSWORD) 25 + - --appendonly 26 + - "yes" 27 + - --appendfsync 28 + - everysec 29 + - --maxmemory 30 + - 64mb 31 + - --maxmemory-policy 32 + - noeviction 33 + env: 34 + - name: REDIS_PASSWORD 35 + valueFrom: 36 + secretKeyRef: 37 + name: redis-credentials 38 + key: password 39 + ports: 40 + - containerPort: 6379 41 + volumeMounts: 42 + - name: data 43 + mountPath: /data 44 + resources: 45 + requests: 46 + cpu: 50m 47 + memory: 64Mi 48 + limits: 49 + cpu: 200m 50 + memory: 128Mi 51 + livenessProbe: 52 + exec: 53 + command: ["redis-cli", "-a", "$(REDIS_PASSWORD)", "ping"] 54 + initialDelaySeconds: 5 55 + periodSeconds: 30 56 + readinessProbe: 57 + exec: 58 + command: ["redis-cli", "-a", "$(REDIS_PASSWORD)", "ping"] 59 + initialDelaySeconds: 3 60 + periodSeconds: 10 61 + volumes: 62 + - name: data 63 + persistentVolumeClaim: 64 + claimName: redis-data
k8s/juicefs/redis-password.secret

This is a binary file and will not be displayed.

+12
k8s/juicefs/redis-pvc.yaml
··· 1 + apiVersion: v1 2 + kind: PersistentVolumeClaim 3 + metadata: 4 + name: redis-data 5 + namespace: juicefs 6 + spec: 7 + accessModes: 8 + - ReadWriteOnce 9 + storageClassName: local-path 10 + resources: 11 + requests: 12 + storage: 1Gi
+11
k8s/juicefs/redis-service.yaml
··· 1 + apiVersion: v1 2 + kind: Service 3 + metadata: 4 + name: redis 5 + namespace: juicefs 6 + spec: 7 + selector: 8 + app: juicefs-redis 9 + ports: 10 + - port: 6379 11 + targetPort: 6379
+15
k8s/juicefs/storageclass.yaml
··· 1 + apiVersion: storage.k8s.io/v1 2 + kind: StorageClass 3 + metadata: 4 + name: juicefs-sc 5 + provisioner: csi.juicefs.com 6 + parameters: 7 + csi.storage.k8s.io/provisioner-secret-name: juicefs-secret 8 + csi.storage.k8s.io/provisioner-secret-namespace: juicefs 9 + csi.storage.k8s.io/node-publish-secret-name: juicefs-secret 10 + csi.storage.k8s.io/node-publish-secret-namespace: juicefs 11 + reclaimPolicy: Retain 12 + allowVolumeExpansion: true 13 + mountOptions: 14 + - cache-size=2048 15 + - buffer-size=300
+1 -1
k8s/knot/pvc.yaml
··· 6 6 spec: 7 7 accessModes: 8 8 - ReadWriteOnce 9 - storageClassName: hcloud-volumes 9 + storageClassName: juicefs-sc 10 10 resources: 11 11 requests: 12 12 storage: 10Gi
+13
k8s/kustomization.yaml
··· 3 3 4 4 resources: 5 5 - shared/cluster-issuer.yaml 6 + - juicefs 6 7 - pds 7 8 - knot 8 9 ··· 40 41 files: 41 42 - access-key=shared/s3-access-key.secret 42 43 - secret-key=shared/s3-secret-key.secret 44 + - name: juicefs-secret 45 + namespace: juicefs 46 + type: Opaque 47 + literals: 48 + - name=sans-self-fs 49 + - storage=s3 50 + - bucket=https://nbg1.your-objectstorage.com/sans-self-net 51 + - format-options=trash-days=1 52 + files: 53 + - metaurl=juicefs/metaurl.secret 54 + - access-key=shared/s3-access-key.secret 55 + - secret-key=shared/s3-secret-key.secret
-1
k8s/pds/cert.yaml
··· 10 10 kind: ClusterIssuer 11 11 dnsNames: 12 12 - sans-self.org 13 - - pds-test.sans-self.org 14 13 - vesper.sans-self.org 15 14 - nyx.sans-self.org
+1 -1
k8s/pds/pvc.yaml
··· 6 6 spec: 7 7 accessModes: 8 8 - ReadWriteOnce 9 - storageClassName: hcloud-volumes 9 + storageClassName: juicefs-sc 10 10 resources: 11 11 requests: 12 12 storage: 20Gi
+37 -11
kube.tf
··· 18 18 19 19 network_region = "eu-central" 20 20 21 - # Single control-plane node (no HA — acceptable for this workload) 21 + # HA control plane — 3 ARM nodes for etcd quorum 22 22 control_plane_nodepools = [ 23 23 { 24 24 name = "control-plane", 25 - server_type = "cx33", 25 + server_type = "cax11", 26 26 location = "nbg1", 27 27 labels = [], 28 28 taints = [], 29 - count = 1 29 + count = 3 30 30 } 31 31 ] 32 32 ··· 38 38 load_balancer_type = "lb11" 39 39 load_balancer_location = "nbg1" 40 40 41 - # Ingress — Traefik, single replica (single node, HPA is pointless) 42 - ingress_replica_count = 1 41 + # Ingress — 2 replicas across 3 nodes for HA 42 + ingress_replica_count = 2 43 43 traefik_autoscaling = false 44 44 45 45 # Pinned to avoid schema breakage with newer chart versions ··· 48 48 # Tangled knot Git SSH passthrough (git clone git@knot.sans-self.org:handle/repo) 49 49 traefik_additional_ports = [{ name = "knot-ssh", port = 22, exposedPort = 22 }] 50 50 51 - # Storage — Hetzner CSI only, no Longhorn 52 - enable_longhorn = false 51 + # Storage — JuiceFS CSI handles persistent storage; Hetzner CSI disabled to 52 + # avoid the upstream blkid/mkfs bug (kubernetes/kubernetes#95183). 53 + # local-path enabled for JuiceFS Redis metadata (can't use JuiceFS for its own metadata) 54 + enable_longhorn = false 55 + disable_hetzner_csi = true 56 + enable_local_storage = true 53 57 hetzner_ccm_use_helm = true 54 58 55 59 # Kubernetes version & upgrades ··· 65 69 "2606:4700:4700::1111", 66 70 ] 67 71 68 - use_control_plane_lb = false 72 + # HA requires control plane LB for stable API endpoint across 3 nodes 73 + use_control_plane_lb = true 69 74 70 - # Firewall — API and SSH restricted to known IPs 71 - firewall_kube_api_source = ["5.132.126.116/32", "89.146.51.229/32"] 72 - firewall_ssh_source = ["5.132.126.116/32", "89.146.51.229/32"] 75 + # Firewall — open, auth is via client certs (kubeconfig) and SSH keys 76 + firewall_kube_api_source = ["0.0.0.0/0", "::/0"] 77 + firewall_ssh_source = ["0.0.0.0/0", "::/0"] 73 78 74 79 extra_firewall_rules = [ 75 80 { ··· 95 100 96 101 provider "hcloud" { 97 102 token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token 103 + } 104 + 105 + provider "helm" { 106 + kubernetes = { 107 + host = module.kube-hetzner.kubeconfig_data.host 108 + client_certificate = module.kube-hetzner.kubeconfig_data.client_certificate 109 + client_key = module.kube-hetzner.kubeconfig_data.client_key 110 + cluster_ca_certificate = module.kube-hetzner.kubeconfig_data.cluster_ca_certificate 111 + } 112 + } 113 + 114 + resource "helm_release" "juicefs_csi" { 115 + name = "juicefs-csi" 116 + namespace = "juicefs" 117 + create_namespace = true 118 + repository = "https://juicedata.github.io/charts/" 119 + chart = "juicefs-csi-driver" 120 + version = "0.31.2" 121 + values = [file("juicefs-csi-values.yaml")] 122 + 123 + depends_on = [module.kube-hetzner] 98 124 } 99 125 100 126 terraform {