···2233Procedures for restoring PDS and knot from S3 backups after data loss.
4455-## Diagnosis
55+## Architecture
6677-Check if volumes are empty (filesystem overhead only = data loss):
77+All persistent volumes are backed by JuiceFS (S3-backed FUSE filesystem). Data survives node rescheduling and pod restarts — the old Hetzner Volumes data-loss-on-reschedule bug is eliminated.
8899-```sh
1010-kubectl exec -n pds deployment/pds -- df -h /pds
1111-kubectl exec -n knot deployment/knot -c knot -- sh -c 'stat /home/git/'
1212-```
99+**What's backed up:**
1010+- **PDS**: SQLite databases only (`account.sqlite`, `did_cache.sqlite`, `sequencer.sqlite`). Blob storage is natively on S3 — not on the PVC, not in backups.
1111+- **Knot**: SQLite database (`knotserver.db`) + git repositories (`repositories/`).
1212+1313+**What's not backed up:**
1414+- **Zot registry**: Container images are rebuildable artifacts. No backups.
1515+- **PDS blobs**: Stored natively in S3 by the PDS process. Already durable — not part of backup/restore.
13161414-PDS: ~8MB on 20GB = empty. Knot: ~7MB on 10GB = empty.
1717+**Schedule:** PDS at 02:00 UTC, knot at 02:30 UTC. Daily.
15181616-Verify services are running on stale data:
1919+## Diagnosis
2020+2121+Check if services have lost their data:
17221823```sh
1924# PDS — should return your account, not RepoNotFound
···2328curl -s "https://knot.sans-self.org/xrpc/sh.tangled.repo.branches?repo=did:plc:wydyrngmxbcsqdvhmd7whmye/infrastructure"
2429```
25303131+Check volume contents directly:
3232+3333+```sh
3434+kubectl exec -n pds deployment/pds -- ls -la /pds/
3535+kubectl exec -n knot deployment/knot -c knot -- ls -la /home/git/data/
3636+```
3737+2638## Inspect S3 Backups
27392828-List bucket contents to find available snapshots:
4040+Get S3 credentials:
4141+4242+```sh
4343+kubectl get secret -n pds pds-s3-credentials -o jsonpath='{.data.access-key}' | base64 -d
4444+kubectl get secret -n pds pds-s3-credentials -o jsonpath='{.data.secret-key}' | base64 -d
4545+```
4646+4747+List available snapshots:
29483049```sh
3150kubectl run s3-check --rm -it --restart=Never --image=rclone/rclone:1.69 -- \
···3756 --s3-region nbg1 --s3-no-check-bucket
3857```
39584040-Get the S3 credentials from either namespace:
4141-4242-```sh
4343-kubectl get secret -n pds pds-s3-credentials -o jsonpath='{.data.access-key}' | base64 -d
4444-kubectl get secret -n pds pds-s3-credentials -o jsonpath='{.data.secret-key}' | base64 -d
4545-```
4646-4747-### What to look for
4848-4949-**DB snapshots** are timestamped copies (`rclone copyto`) — older ones survive even if a bad backup ran. Pick the newest snapshot from *before* the data loss.
5050-5151-**Directory backups** (`actors/`, `blocks/`, `repositories/`) use `rclone copy` and are not deleted by subsequent runs. They represent the latest state at backup time.
5959+DB snapshots are timestamped (e.g. `account-20260225-020012.sqlite`). Pick the newest one from *before* the data loss.
52605361## Restore PDS
5462···61696270### 2. Run restore job
63716464-Pick the right DB timestamp. The backup runs daily at 02:00 UTC.
7272+Replace `TIMESTAMP` with the chosen snapshot timestamp (e.g. `20260225-020012`).
7373+7474+Only SQLite databases are restored — blob storage lives natively on S3 and doesn't need restoration.
65756676```yaml
6777# kubectl apply -f - <<'YAML'
···94104 rclone copyto ":s3:sans-self-net/pds/db/account-TIMESTAMP.sqlite" /data/account.sqlite ${S3}
95105 rclone copyto ":s3:sans-self-net/pds/db/did_cache-TIMESTAMP.sqlite" /data/did_cache.sqlite ${S3}
96106 rclone copyto ":s3:sans-self-net/pds/db/sequencer-TIMESTAMP.sqlite" /data/sequencer.sqlite ${S3}
9797-9898- rclone copy ":s3:sans-self-net/pds/actors" /data/actors ${S3}
9999- rclone copy ":s3:sans-self-net/pds/blocks" /data/blocks ${S3}
100107101108 ls -la /data/
102109 echo "PDS restore complete"
···243250 claimName: knot-data
244251```
245252246246-### 3. Fix repo ACLs (if needed)
253253+### 3. Fix post-receive hooks
254254+255255+Restored git repositories may have non-executable `post-receive` hooks (FUSE `default_permissions` prevents root-in-container from chmod on files owned by `git`). Fix as the `git` user:
256256+257257+```sh
258258+kubectl exec -n knot deploy/knot -- su -s /bin/sh git -c \
259259+ 'find /home/git/repositories -name post-receive -exec chmod +x {} \;'
260260+```
261261+262262+Without this, pushes land in the bare repo but knot never processes them — no feed updates, no diff indexing, no notifications.
263263+264264+### 4. Fix repo ACLs (if needed)
247265248266The knot DB stores per-repo ACL entries. If a repo was created after the backup, its ACL will be missing and pushes will fail with `access denied: user not allowed` even though SSH auth succeeds.
249267···298316 claimName: knot-data
299317```
300318301301-### 4. Scale up and verify
319319+### 5. Scale up and verify
302320303321```sh
304322kubectl scale deployment -n knot knot --replicas=1
···328346329347## Known Gotchas
330348331331-- **Hetzner volumes lose data on node reschedule.** The volumes stay `Bound` and mounted but are empty. The pod restarts fine on the empty volume, masking the data loss.
349349+- **PDS blobs are not in backups.** They live natively on S3 via the PDS process. If the S3 bucket itself is lost, blobs are gone. The backup only covers SQLite databases.
332350- **Choose the right DB snapshot.** Check all available timestamps in S3. The most recent snapshot before data loss is usually best, but if accounts were created between backups, a later snapshot might have more complete account records.
333351- **Sequencer cursor mismatch kills federation.** Posts succeed locally but don't reach Bluesky. Always bump the sequencer autoincrement past the relay's cursor after restore.
334352- **Knot ACLs are per-repo.** The server owner can push to repos that have ACL entries. Repos created after the backup will have git data on disk but no ACL — you must add entries manually.
353353+- **Knot post-receive hooks may lose execute permissions.** After restoring from S3, hooks may not be executable due to FUSE `default_permissions`. Must chmod as the `git` user, not root.
335354- **SSH host keys change on pod restart.** Every knot scale-down/up regenerates sshd host keys. Run `ssh-keygen -R` to clear stale entries.