Kubernetes Operator for Tangled Spindles

clean up docs

evan.jarrett.net 532b0e32 dcbdea60

verified
+236 -1681
+1 -1
.devcontainer/devcontainer.json
··· 1 1 { 2 2 "name": "Kubebuilder DevContainer", 3 - "image": "golang:1.24", 3 + "image": "golang:1.25", 4 4 "features": { 5 5 "ghcr.io/devcontainers/features/docker-in-docker:2": {}, 6 6 "ghcr.io/devcontainers/features/git:1": {}
-360
ARCHITECTURE.md
··· 1 - # Loom Architecture 2 - 3 - Loom is a Kubernetes operator that runs tangled.org Spindle with a Kubernetes-native execution engine. 4 - 5 - ## Overview 6 - 7 - ``` 8 - ┌─────────────────────────────────────────────┐ 9 - │ Loom Operator Pod │ 10 - │ │ 11 - │ ┌────────────────────────────────────┐ │ 12 - │ │ Controller Manager │ │ 13 - │ │ - Watches SpindleSet CRD │ │ 14 - │ │ - Monitors Kubernetes Jobs │ │ 15 - │ │ - Reports status to spindle DB │ │ 16 - │ │ - Creates Service endpoints │ │ 17 - │ └────────────────────────────────────┘ │ 18 - │ │ 19 - │ ┌────────────────────────────────────┐ │ 20 - │ │ Embedded Spindle Server │ │ 21 - │ │ │ │ 22 - │ │ HTTP Endpoints: │ │ 23 - │ │ - GET /events (WebSocket) │ │ 24 - │ │ - GET /logs/{knot}/{rkey}/{name} │ │ 25 - │ │ - POST /xrpc/sh.tangled.repo.* │ │ 26 - │ │ │ │ 27 - │ │ Components: │ │ 28 - │ │ - EventConsumer (knot firehose) │ │ 29 - │ │ - Database (SQLite) │ │ 30 - │ │ - Queue (job queue) │ │ 31 - │ │ - Vault (secrets manager) │ │ 32 - │ │ - KubernetesEngine ──────────┐ │ │ 33 - │ └──────────────────────────────│─────┘ │ 34 - └─────────────────────────────────│───────────┘ 35 - 36 - │ creates 37 - 38 - ┌──────────────────────────┐ 39 - │ Kubernetes Jobs │ 40 - │ (one per workflow) │ 41 - │ │ 42 - │ ┌──────────────────────┐ │ 43 - │ │ Init Container: │ │ 44 - │ │ - Clone repo │ │ 45 - │ │ - Checkout commit │ │ 46 - │ └──────────────────────┘ │ 47 - │ │ 48 - │ ┌──────────────────────┐ │ 49 - │ │ Main Container: │ │ 50 - │ │ - Execute all steps │ │ 51 - │ │ in sequence │ │ 52 - │ └──────────────────────┘ │ 53 - └──────────────────────────┘ 54 - ``` 55 - 56 - ## Components 57 - 58 - ### Loom Operator 59 - 60 - The Loom operator is a standard Kubernetes controller that: 61 - 1. Watches `SpindleSet` custom resources 62 - 2. Embeds a spindle server instance 63 - 3. Creates Kubernetes Services to expose spindle HTTP endpoints 64 - 4. Monitors Job status and reports to spindle's database 65 - 66 - ### Embedded Spindle Server 67 - 68 - Instead of deploying separate spindle server instances, Loom **embeds** the spindle server: 69 - - Runs in the same process as the controller 70 - - Uses `spindle.New()` to initialize with `KubernetesEngine` 71 - - Handles all spindle functionality: WebSocket connections, XRPC endpoints, database, queue, etc. 72 - 73 - ### KubernetesEngine 74 - 75 - A custom engine implementation that uses Kubernetes Jobs instead of Docker containers. 76 - 77 - ## Execution Model: KubernetesEngine vs NixeryEngine 78 - 79 - ### NixeryEngine Pattern (Docker) 80 - 81 - ``` 82 - ┌─────────────────────────────────────────────┐ 83 - │ SetupWorkflow() │ 84 - │ - docker create <image> cat │ 85 - │ - docker start <container> │ 86 - │ - Keep container running │ 87 - └─────────────────────────────────────────────┘ 88 - 89 - 90 - ┌─────────────────────────────────────────────┐ 91 - │ For each step: │ 92 - │ RunStep() │ 93 - │ - docker exec <container> bash -c <cmd> │ 94 - │ - Stream logs to WorkflowLogger │ 95 - │ - Wait for completion │ 96 - │ - Return exit code │ 97 - └─────────────────────────────────────────────┘ 98 - 99 - 100 - ┌─────────────────────────────────────────────┐ 101 - │ DestroyWorkflow() │ 102 - │ - docker stop <container> │ 103 - │ - docker rm <container> │ 104 - └─────────────────────────────────────────────┘ 105 - ``` 106 - 107 - **Key characteristics:** 108 - - One long-running container per workflow 109 - - Steps executed sequentially via `docker exec` 110 - - Spindle controls step execution timing 111 - - Fine-grained logging per step 112 - 113 - ### KubernetesEngine Pattern (Jobs) 114 - 115 - ``` 116 - ┌─────────────────────────────────────────────┐ 117 - │ SetupWorkflow() │ 118 - │ - Generate bash script with ALL steps │ 119 - │ - Create Kubernetes Job │ 120 - │ - Init container: clone repo │ 121 - │ - Main container: execute script │ 122 - │ - Job runs to completion │ 123 - └─────────────────────────────────────────────┘ 124 - 125 - 126 - ┌─────────────────────────────────────────────┐ 127 - │ RunStep() - NO-OP │ 128 - │ (Steps are already running in the Job) │ 129 - └─────────────────────────────────────────────┘ 130 - 131 - 132 - ┌─────────────────────────────────────────────┐ 133 - │ DestroyWorkflow() │ 134 - │ - Delete Kubernetes Job │ 135 - │ - TTL controller auto-cleans after 1 hour │ 136 - └─────────────────────────────────────────────┘ 137 - ``` 138 - 139 - **Key characteristics:** 140 - - One Kubernetes Job per workflow 141 - - All steps baked into a single bash script 142 - - Job runs autonomously after creation 143 - - Logs captured as a single stream per Job 144 - 145 - ## Why Different Execution Models? 146 - 147 - ### Docker: exec pattern is lightweight 148 - - `docker exec` is a cheap operation 149 - - Easy to run commands in existing container 150 - - Low overhead for sequential execution 151 - 152 - ### Kubernetes: exec pattern is heavyweight 153 - - `kubectl exec` involves API calls, auth, streaming setup 154 - - Each exec requires network round-trips 155 - - Kubernetes is designed for declarative workloads, not imperative step-by-step 156 - 157 - ### Jobs are idiomatic Kubernetes 158 - - Jobs are purpose-built for batch workloads 159 - - Native features: status tracking, TTL cleanup, retry policies 160 - - Declarative: describe the work, Kubernetes handles execution 161 - - Better isolation: each workflow is a separate Job 162 - 163 - ### Script generation is efficient 164 - - Single script with all steps: `BuildStepExecutionScript()` 165 - - GitHub Actions-compatible environment passing 166 - - Error handling built into script 167 - - Timestamps and step boundaries in logs 168 - - No coordinator overhead 169 - 170 - ## Trade-offs 171 - 172 - ### What we gain with Jobs: 173 - ✅ Idiomatic Kubernetes 174 - ✅ Simpler implementation 175 - ✅ Better isolation per workflow 176 - ✅ Native Job status and cleanup 177 - ✅ No persistent connections needed 178 - ✅ Works with standard Kubernetes features 179 - 180 - ### What we lose vs Docker exec: 181 - ❌ No per-step control from spindle 182 - ❌ Can't pause/resume between steps 183 - ❌ Logs are bundled (not separated by step) 184 - ❌ `RunStep()` is a no-op 185 - ❌ Can't dynamically change step execution 186 - 187 - ### Is this okay? 188 - 189 - **Yes!** The execution model difference is intentional and appropriate for Kubernetes. We still: 190 - - Run all steps in order 191 - - Handle errors correctly 192 - - Pass environment between steps 193 - - Report status to spindle DB 194 - - Stream logs 195 - - Support secrets 196 - 197 - The interface contract with spindle's Engine is satisfied, just with a different implementation strategy. 198 - 199 - ## Step Script Generation 200 - 201 - See `pkg/jobbuilder/script_builder.go` for how we generate the bash script. 202 - 203 - ### Features: 204 - - **GitHub Actions compatibility**: `GITHUB_ENV`, `GITHUB_PATH`, `GITHUB_OUTPUT` 205 - - **Environment passing**: Steps can set variables for later steps 206 - - **Error handling**: Script exits on first failure 207 - - **Timestamps**: Every step logs start/end times 208 - - **Step boundaries**: Clear markers in logs 209 - 210 - ### Example generated script: 211 - 212 - ```bash 213 - #!/bin/bash 214 - set -e 215 - set -o pipefail 216 - 217 - # Setup GITHUB_ENV, GITHUB_PATH, GITHUB_OUTPUT 218 - export GITHUB_ENV=/tmp/github/env 219 - export GITHUB_PATH=/tmp/github/path 220 - export GITHUB_OUTPUT=/tmp/github/output 221 - 222 - # Helper functions 223 - add_to_env() { echo "$1=$2" >> $GITHUB_ENV; export "$1"="$2"; } 224 - add_to_path() { echo "$1" >> $GITHUB_PATH; export PATH="$1:$PATH"; } 225 - load_github_env() { source "$GITHUB_ENV" 2>/dev/null || true; } 226 - 227 - ############################################################################## 228 - # Step 1: Build 229 - ############################################################################## 230 - echo "2025-11-06T15:00:00.000Z ===== Starting Step 1: Build =====" 231 - load_github_env 232 - cd /tangled/workspace 233 - go build ./... || { echo "Step 'Build' failed"; exit 1; } 234 - echo "2025-11-06T15:00:10.000Z ===== Completed Step 1: Build =====" 235 - 236 - ############################################################################## 237 - # Step 2: Test 238 - ############################################################################## 239 - echo "2025-11-06T15:00:10.000Z ===== Starting Step 2: Test =====" 240 - load_github_env 241 - cd /tangled/workspace 242 - go test ./... || { echo "Step 'Test' failed"; exit 1; } 243 - echo "2025-11-06T15:00:20.000Z ===== Completed Step 2: Test =====" 244 - 245 - exit 0 246 - ``` 247 - 248 - ## Configuration 249 - 250 - Loom reads spindle configuration from environment variables (via `tangled.org/core/spindle/config`): 251 - 252 - ```bash 253 - SPINDLE_SERVER_LISTEN_ADDR=0.0.0.0:6555 254 - SPINDLE_SERVER_DB_PATH=/data/spindle.db 255 - SPINDLE_SERVER_HOSTNAME=loom.example.com 256 - SPINDLE_SERVER_OWNER=did:web:example.com 257 - SPINDLE_SERVER_QUEUE_SIZE=100 258 - SPINDLE_SERVER_MAX_JOB_COUNT=2 259 - SPINDLE_SERVER_LOG_DIR=/var/log/spindle 260 - SPINDLE_SERVER_SECRETS_PROVIDER=sqlite 261 - ``` 262 - 263 - These are typically set in the Loom Deployment manifest. 264 - 265 - ## Status Reporting 266 - 267 - Status flows through spindle's event system: 268 - 269 - 1. **Job status changes** detected by controller 270 - 2. **Write to spindle DB**: `db.StatusPending/Running/Success/Failed()` 271 - 3. **Broadcast via Notifier**: `notifier.NotifyAll()` 272 - 4. **Stream to subscribers**: Knots connected to `/events` WebSocket receive updates 273 - 274 - This follows spindle's standard pattern - no custom status reporter needed. 275 - 276 - ## Networking 277 - 278 - ### Internal (Cluster) 279 - - Spindle HTTP server listens on `:6555` inside operator pod 280 - - Controller creates a Kubernetes Service to expose it 281 - - Service name: `<spindleset-name>-spindle` 282 - - Endpoints available at: `http://<service>.<namespace>.svc.cluster.local:6555` 283 - 284 - ### External Access 285 - Options for external access: 286 - 1. **NodePort**: Change Service type to expose on node IP 287 - 2. **LoadBalancer**: Create cloud load balancer 288 - 3. **Ingress**: Route external traffic to Service 289 - 4. **Port forwarding**: For testing: `kubectl port-forward svc/spindleset-sample-spindle 6555:6555` 290 - 291 - ## Multi-Architecture Support 292 - 293 - Jobs can target specific node architectures using NodeAffinity: 294 - 295 - ```yaml 296 - spec: 297 - template: 298 - architecture: arm64 # or amd64 299 - ``` 300 - 301 - The jobbuilder generates NodeAffinity rules: 302 - ```yaml 303 - affinity: 304 - nodeAffinity: 305 - requiredDuringSchedulingIgnoredDuringExecution: 306 - nodeSelectorTerms: 307 - - matchExpressions: 308 - - key: kubernetes.io/arch 309 - operator: In 310 - values: [arm64] 311 - ``` 312 - 313 - ## Future Enhancements 314 - 315 - ### 1. Structured Logs Per Step 316 - Have the Job script POST progress updates: 317 - ```bash 318 - curl -X POST $SPINDLE_STATUS_URL -d '{"step": 1, "status": "start"}' 319 - go build ./... 320 - curl -X POST $SPINDLE_STATUS_URL -d '{"step": 1, "status": "complete"}' 321 - ``` 322 - 323 - ### 2. Step-level Artifacts 324 - Mount a PVC for step outputs: 325 - ```yaml 326 - volumes: 327 - - name: artifacts 328 - persistentVolumeClaim: 329 - claimName: workflow-artifacts 330 - ``` 331 - 332 - ### 3. Parallel Steps 333 - Support running independent steps in parallel (separate containers in same pod). 334 - 335 - ### 4. Dynamic Step Execution 336 - For advanced use cases, implement the exec pattern: 337 - - Create long-running Pod (not Job) 338 - - Implement `RunStep()` with `kubectl exec` 339 - - Trade efficiency for flexibility 340 - 341 - ## Comparison with Other Engines 342 - 343 - | Feature | NixeryEngine | KubernetesEngine | 344 - |---------|-------------|------------------| 345 - | Runtime | Docker | Kubernetes | 346 - | Isolation | Container | Job (Pod) | 347 - | Step execution | Sequential exec | Baked script | 348 - | RunStep() | Active | No-op | 349 - | Overhead | Low | Medium (K8s API) | 350 - | Idiomatic | Yes (Docker) | Yes (K8s) | 351 - | Scalability | Limited | High | 352 - | Multi-tenancy | Shared host | Cluster-native | 353 - 354 - ## References 355 - 356 - - **Spindle Core**: `/home/data/core/spindle/` 357 - - **NixeryEngine**: `/home/data/core/spindle/engines/nixery/engine.go` 358 - - **KubernetesEngine**: `/home/data/loom/internal/engine/kubernetes_engine.go` 359 - - **Job Builder**: `/home/data/loom/pkg/jobbuilder/` 360 - - **Script Builder**: `/home/data/loom/pkg/jobbuilder/script_builder.go`
BUILDAH_IMPLEMENTATION.md docs/proposals/BUILDAH_IMPLEMENTATION.md
+80 -269
CLAUDE.md
··· 1 1 # Working with Claude Code on Loom 2 2 3 - This document describes how Claude Code was used to develop the Loom operator and provides guidelines for continuing development with AI assistance. 3 + Guidelines for developing the Loom operator with AI assistance. 4 4 5 5 ## Project Context 6 6 7 7 **What is Loom?** 8 - Loom is a Kubernetes operator that coordinates tangled.org Spindles - ephemeral CI/CD runners inspired by GitHub's Actions Runner Controller (ARC). It enables running pipeline workflows in Kubernetes in response to events from tangled.org knots. 8 + Loom is a Kubernetes operator that runs CI/CD pipelines from tangled.org. It creates ephemeral Jobs in response to WebSocket events and streams logs back to the platform. 9 9 10 10 **Key Characteristics:** 11 11 - Built with operator-sdk and Kubebuilder 12 12 - Integrates with tangled.org's AT Protocol-based event system 13 - - Reuses components from `tangled.org/core/spindle` 14 - - Implements a Kubernetes-native execution engine 15 - 16 - ## Development Approach 17 - 18 - ### Research Phase 19 - Claude Code researched three key areas before implementation: 13 + - Reuses `tangled.org/core/spindle` models and interfaces 14 + - Implements a Kubernetes-native `Engine` for Job-based execution 20 15 21 - 1. **GitHub Actions Runner Controller (ARC)** 22 - - Studied three-tier controller architecture 23 - - Learned JIT token patterns for security 24 - - Understood ephemeral runner lifecycle 25 - - Reviewed auto-scaling mechanisms 16 + ## Code Organization 26 17 27 - 2. **Chainguard Kaniko Fork** 28 - - Investigated rootless container builds 29 - - Understood security benefits 30 - - Evaluated integration patterns (deferred for MVP) 18 + ``` 19 + loom/ 20 + ├── api/v1alpha1/ # CRD types (SpindleSet) 21 + ├── cmd/ 22 + │ ├── controller/ # Operator entry point 23 + │ └── runner/ # Binary injected into Job pods 24 + ├── internal/ 25 + │ ├── controller/ # SpindleSet reconciliation 26 + │ ├── engine/ # KubernetesEngine (implements spindle.Engine) 27 + │ └── jobbuilder/ # Job spec generation 28 + └── config/ # Kubernetes manifests 29 + ``` 31 30 32 - 3. **tangled.org Spindles** 33 - - Analyzed existing implementation in `/home/data/core/spindle` 34 - - Studied Engine interface and models 35 - - Understood WebSocket event ingestion 36 - - Reviewed Nixery integration (simplified for MVP) 31 + ## Design Decisions 37 32 38 - ### Architectural Decisions 33 + ### Why Jobs Instead of Exec? 39 34 40 - **Simplifications Made:** 41 - 1. **No Nixery for MVP**: Use standard Docker images instead of dynamic Nix-based images 42 - 2. **Ephemeral Jobs**: Scale-to-zero approach, one Job per pipeline 43 - 3. **Kubernetes-native logging**: Stream logs via K8s API instead of disk-based 44 - 4. **Direct WebSocket**: Maintain persistent connection to knot (not polling) 35 + The upstream `NixeryEngine` uses `docker exec` to run steps in a long-running container. Loom uses Kubernetes Jobs instead: 45 36 46 - **Code Reuse Strategy:** 47 - - Import models and interfaces from `tangled.org/core/spindle` 48 - - Reuse WebSocket client logic 49 - - Implement new `KubernetesEngine` for Job-based execution 50 - - Avoid duplicating existing functionality 37 + | Aspect | Docker (NixeryEngine) | Kubernetes (Loom) | 38 + |--------|----------------------|-------------------| 39 + | Step execution | `docker exec` per step | Runner binary in Job | 40 + | Container lifecycle | Long-running, reused | Ephemeral, one per workflow | 41 + | Log streaming | Per-step via exec | JSON events via pod logs | 42 + | Overhead | Low (exec is cheap) | Medium (Job creation) | 51 43 52 - ## Claude Code Usage Guidelines 44 + **Why this works for Kubernetes:** 45 + - `kubectl exec` is heavyweight (API calls, auth, streaming setup) 46 + - Jobs are idiomatic for batch workloads 47 + - Native status tracking, TTL cleanup, retry policies 48 + - Better isolation: each workflow is a separate Job 53 49 54 - ### When to Use Claude Code 50 + ### Runner Binary Approach 55 51 56 - **Good Use Cases:** 57 - - Implementing boilerplate (CRDs, controllers, builders) 58 - - Generating Kubernetes manifests and RBAC 59 - - Creating test scaffolding 60 - - Refactoring for clarity 61 - - Adding logging and metrics 62 - - Updating documentation 52 + Instead of generating a bash script with all steps, Loom injects a `runner` binary that: 53 + 1. Reads workflow spec from `LOOM_WORKFLOW_SPEC` env var 54 + 2. Executes steps sequentially 55 + 3. Emits JSON log events (`{"kind":"control","stepId":0,"stepStatus":"start"}`) 56 + 4. Controller streams pod logs and parses events 63 57 64 - **Not Recommended:** 65 - - Security-critical authentication logic (review carefully) 66 - - Complex AT Protocol interactions (defer to existing code) 67 - - Performance-critical paths (benchmark first) 58 + This enables per-step status tracking while keeping the Job model. 68 59 69 - ### Prompting Best Practices 60 + ## Common Patterns 70 61 71 - **Provide Context:** 72 - ``` 73 - I want to add [feature]. We're using the KubernetesEngine from 74 - internal/engine/kubernetes_engine.go which implements the Engine 75 - interface from tangled.org/core/spindle/models. The feature should 76 - [specific behavior]. 77 - ``` 78 - 79 - **Reference Existing Code:** 80 - ``` 81 - Look at how core/spindle/engines/nixery/engine.go handles [X]. 82 - We need similar logic in our KubernetesEngine but adapted for 83 - Kubernetes Jobs instead of Docker containers. 84 - ``` 85 - 86 - **Specify Constraints:** 87 - ``` 88 - Implement [feature] but: 89 - 1. Reuse the existing Job builder pattern 90 - 2. Add Prometheus metrics 91 - 3. Follow the error handling pattern from the controller 92 - 4. Don't break multi-arch support 93 - ``` 94 - 95 - ### Common Patterns in This Project 96 - 97 - **1. Job Creation Pattern** 62 + ### Job Creation 98 63 ```go 99 - // Jobs are owned by SpindleSet for automatic cleanup 64 + // Jobs owned by SpindleSet for automatic cleanup 100 65 ctrl.SetControllerReference(spindleSet, job, r.Scheme) 101 66 102 - // Jobs labeled for querying 67 + // Labels for querying 103 68 labels := map[string]string{ 104 - "loom.j5t.io/spindleset": spindleSet.Name, 105 - "loom.j5t.io/workflow": workflowName, 69 + "loom.j5t.io/spindleset": spindleSet.Name, 70 + "loom.j5t.io/pipeline-id": pipelineID, 71 + "loom.j5t.io/workflow": workflowName, 106 72 } 107 73 ``` 108 74 109 - **2. Status Update Pattern** 75 + ### Multi-Arch Node Targeting 110 76 ```go 111 - // Always update status in defer to ensure updates even on errors 112 - defer func() { 113 - if err := r.Status().Update(ctx, spindleSet); err != nil { 114 - log.Error(err, "Failed to update SpindleSet status") 115 - } 116 - }() 77 + // Architecture from workflow spec -> node affinity 78 + affinity := BuildArchitectureAffinity(workflow.Architecture) // "amd64" or "arm64" 117 79 ``` 118 80 119 - **3. Multi-Arch Node Targeting** 81 + ### Resource Profile Selection 120 82 ```go 121 - // Architecture specified in workflow, translated to node affinity 122 - affinity := &corev1.Affinity{ 123 - NodeAffinity: &corev1.NodeAffinity{ 124 - RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{ 125 - NodeSelectorTerms: []corev1.NodeSelectorTerm{{ 126 - MatchExpressions: []corev1.NodeSelectorRequirement{{ 127 - Key: "kubernetes.io/arch", 128 - Operator: corev1.NodeSelectorOpIn, 129 - Values: []string{workflow.Architecture}, 130 - }}, 131 - }}, 132 - }, 133 - }, 134 - } 83 + // Select profile based on architecture and available nodes 84 + resources, nodeSelector := selectResourceProfile(profiles, architecture, nodes) 135 85 ``` 136 86 137 - **4. Metrics Pattern** 138 - ```go 139 - // Register metrics in init() 140 - func init() { 141 - metrics.Registry.MustRegister(runningSpindles, completedSpindles) 142 - } 143 - 144 - // Update metrics in controller 145 - runningSpindles.Set(float64(spindleSet.Status.RunningJobs)) 146 - ``` 147 - 148 - ## Development Workflow 149 - 150 - ### Standard Development Cycle 151 - 152 - 1. **Plan**: Update `PLAN.md` with new feature/phase 153 - 2. **Implement**: Generate code with Claude Code 154 - 3. **Test**: Run unit tests (`make test`) 155 - 4. **Generate**: Update manifests (`make manifests`) 156 - 5. **Deploy**: Test in cluster (`make deploy`) 157 - 6. **Iterate**: Fix issues, update docs 158 - 159 - ### Key Commands 87 + ## Key Commands 160 88 161 89 ```bash 162 - # Generate CRDs and code 163 - make manifests generate 164 - 165 - # Run tests 166 - make test 167 - 168 - # Build operator 169 - make build 170 - 171 - # Deploy to cluster 172 - make deploy IMG=<your-registry>/loom:tag 173 - 174 - # Run locally (for debugging) 175 - make install run 176 - 177 - # Clean up 178 - make undeploy 90 + make manifests generate # Regenerate CRDs and code 91 + make test # Run unit tests 92 + make build # Build operator 93 + make deploy IMG=<image> # Deploy to cluster 94 + make install run # Run locally for debugging 179 95 ``` 180 96 181 - ### Testing Strategy 97 + ## Common Tasks 182 98 183 - **Unit Tests:** 184 - - Mock Kubernetes clients 185 - - Test Job builder logic 186 - - Test affinity generation 187 - - Test script builder 99 + ### Adding a Field to SpindleSet 100 + 1. Edit `api/v1alpha1/spindleset_types.go` 101 + 2. Add kubebuilder markers 102 + 3. Run `make manifests generate` 103 + 4. Update controller logic 104 + 5. Add tests 188 105 189 - **Integration Tests:** 190 - - Deploy to test cluster (kind/k3s) 191 - - Create SpindleSet CR 192 - - Simulate pipeline events 193 - - Verify Jobs created correctly 194 - - Check logs and status 106 + ### Changing Job Template 107 + 1. Edit `internal/jobbuilder/job_template.go` 108 + 2. Run `make test` 109 + 3. Deploy and verify 195 110 196 - **Manual Testing:** 197 - - Deploy to real cluster with mixed amd64/arm64 nodes 198 - - Connect to real tangled.org knot 199 - - Trigger actual pipeline runs 200 - - Monitor with Prometheus/Grafana 201 - 202 - ## Code Organization 203 - 204 - ### Package Structure 205 - 206 - ``` 207 - loom/ 208 - ├── api/v1alpha1/ # CRD types (SpindleSet) 209 - ├── internal/ 210 - │ ├── controller/ # Reconciliation logic 211 - │ └── engine/ # KubernetesEngine implementation 212 - ├── pkg/ 213 - │ ├── ingester/ # WebSocket client 214 - │ ├── jobbuilder/ # Job template generation 215 - │ └── knot/ # Knot API client 216 - └── config/ # Kubernetes manifests 217 - ``` 218 - 219 - ### Import Guidelines 111 + ## Imports 220 112 221 113 **From tangled.org/core:** 222 114 ```go 223 - // Models and interfaces 224 - import "tangled.org/core/spindle/models" 225 - import "tangled.org/core/api/tangled" 226 - 227 - // Adapt, don't import directly: 228 - // - WebSocket client (pkg/ingester) 229 - // - Status reporting (pkg/knot) 115 + import "tangled.org/core/spindle/models" // Engine interface, Workflow, Step 116 + import "tangled.org/core/api/tangled" // Pipeline types 117 + import "tangled.org/core/spindle/secrets" // Secret management 230 118 ``` 231 119 232 120 **Kubernetes:** ··· 236 124 import ctrl "sigs.k8s.io/controller-runtime" 237 125 ``` 238 126 239 - ## Common Tasks 240 - 241 - ### Adding a New Field to SpindleSet 242 - 243 - 1. Edit `api/v1alpha1/spindleset_types.go` 244 - 2. Add field with proper kubebuilder markers 245 - 3. Run `make manifests generate` 246 - 4. Update sample CRs in `config/samples/` 247 - 5. Update controller logic to use field 248 - 6. Add tests for new behavior 249 - 250 - ### Changing Job Template 251 - 252 - 1. Edit `pkg/jobbuilder/job_template.go` 253 - 2. Update Job generation logic 254 - 3. Run unit tests (`make test`) 255 - 4. Deploy and test (`make deploy`) 256 - 5. Update documentation if needed 257 - 258 - ### Adding Prometheus Metrics 259 - 260 - 1. Define metric in controller file 261 - 2. Register in `init()` function 262 - 3. Update metric in reconciliation loop 263 - 4. Add metric to Prometheus config 264 - 5. Document in `PLAN.md` metrics section 265 - 266 127 ## Troubleshooting 267 128 268 - ### Common Issues 269 - 270 - **Issue: CRD not updating** 129 + **CRD not updating:** 271 130 ```bash 272 - # Regenerate and reinstall 273 - make manifests 274 - make install 275 - ``` 276 - 277 - **Issue: Controller not reconciling** 278 - ```bash 279 - # Check logs 280 - kubectl logs -n loom-system deployment/loom-controller-manager 281 - 282 - # Check RBAC 283 - kubectl auth can-i create jobs --as=system:serviceaccount:loom-system:loom-controller-manager 131 + make manifests && make install 284 132 ``` 285 133 286 - **Issue: Jobs stuck pending** 134 + **Jobs stuck pending:** 287 135 ```bash 288 - # Check job spec 289 - kubectl get job <job-name> -o yaml 290 - 291 - # Check events 292 136 kubectl describe job <job-name> 293 - 294 - # Check node availability 295 137 kubectl get nodes -L kubernetes.io/arch 296 138 ``` 297 139 298 - **Issue: WebSocket disconnecting** 140 + **Controller issues:** 299 141 ```bash 300 - # Check SpindleSet status 301 - kubectl get spindleset -o wide 302 - 303 - # Check controller logs for connection errors 304 - kubectl logs -n loom-system deployment/loom-controller-manager | grep -i websocket 142 + kubectl logs -n loom-system deployment/loom-controller-manager 305 143 ``` 306 144 307 - ## Contributing 308 - 309 - ### Before Asking Claude Code for Help 310 - 311 - 1. Read the existing code in the affected area 312 - 2. Check `PLAN.md` for architectural context 313 - 3. Review similar implementations in the codebase 314 - 4. Check tangled.org/core for reusable components 315 - 316 - ### After Getting Code from Claude Code 317 - 318 - 1. Review generated code for correctness 319 - 2. Run tests (`make test`) 320 - 3. Check for proper error handling 321 - 4. Verify metrics are updated 322 - 5. Update documentation 323 - 6. Test in real cluster 324 - 325 - ## Resources 326 - 327 - - **Kubebuilder Book**: https://book.kubebuilder.io/ 328 - - **Controller Runtime**: https://pkg.go.dev/sigs.k8s.io/controller-runtime 329 - - **GitHub ARC**: https://github.com/actions/actions-runner-controller 330 - - **tangled.org Core**: /home/data/core/ 331 - - **Operator SDK**: https://sdk.operatorframework.io/ 332 - 333 145 ## Notes 334 146 335 - - This project prioritizes simplicity over completeness for MVP 336 - - Code reuse from tangled.org/core is preferred over reimplementation 337 147 - Multi-architecture support is a first-class concern 338 - - Prometheus metrics are essential, not optional 339 - - Security (RBAC, secrets) should be carefully reviewed 148 + - Jobs run as non-root (UID 1000) with minimal capabilities 149 + - Buildah support requires unconfined seccomp + SETUID/SETGID caps 150 + - SpindleSets are internal resources created by the engine, not users
-549
CONFIGURATION.md
··· 1 - # Loom Configuration Guide 2 - 3 - Loom is configured via environment variables that are passed to the embedded spindle server. 4 - 5 - ## Quick Start 6 - 7 - 1. Create namespace: `kubectl create namespace loom-system` 8 - 2. Apply CRDs: `kubectl apply -f config/crd/bases/` 9 - 3. Apply RBAC: `kubectl apply -f config/rbac/` 10 - 4. Deploy operator with config (see below) 11 - 5. Create SpindleSet: `kubectl apply -f config/samples/` 12 - 13 - ## Required Environment Variables 14 - 15 - These must be set in the Loom operator Deployment: 16 - 17 - ```yaml 18 - apiVersion: apps/v1 19 - kind: Deployment 20 - metadata: 21 - name: loom-controller-manager 22 - namespace: loom-system 23 - spec: 24 - replicas: 1 25 - selector: 26 - matchLabels: 27 - control-plane: controller-manager 28 - template: 29 - metadata: 30 - labels: 31 - control-plane: controller-manager 32 - spec: 33 - containers: 34 - - name: manager 35 - image: loom:latest 36 - env: 37 - # Required: Spindle server configuration 38 - - name: SPINDLE_SERVER_LISTEN_ADDR 39 - value: "0.0.0.0:6555" 40 - 41 - - name: SPINDLE_SERVER_DB_PATH 42 - value: "/data/spindle.db" 43 - 44 - - name: SPINDLE_SERVER_HOSTNAME 45 - value: "loom.example.com" # Change to your domain 46 - 47 - - name: SPINDLE_SERVER_OWNER 48 - value: "did:web:example.com" # Change to your DID 49 - 50 - - name: SPINDLE_SERVER_JETSTREAM_ENDPOINT 51 - value: "wss://jetstream1.us-west.bsky.network/subscribe" 52 - 53 - # Optional: Adjust queue settings 54 - - name: SPINDLE_SERVER_QUEUE_SIZE 55 - value: "100" 56 - 57 - - name: SPINDLE_SERVER_MAX_JOB_COUNT 58 - value: "2" # Max concurrent workflows 59 - 60 - # Optional: Log directory (not used much in K8s) 61 - - name: SPINDLE_SERVER_LOG_DIR 62 - value: "/var/log/spindle" 63 - 64 - # Optional: Development mode 65 - - name: SPINDLE_SERVER_DEV 66 - value: "false" 67 - 68 - # Optional: Secrets provider (default: sqlite) 69 - - name: SPINDLE_SERVER_SECRETS_PROVIDER 70 - value: "sqlite" 71 - 72 - ports: 73 - - containerPort: 6555 74 - name: http 75 - protocol: TCP 76 - 77 - volumeMounts: 78 - - name: data 79 - mountPath: /data 80 - 81 - volumes: 82 - - name: data 83 - emptyDir: {} # Or use PersistentVolume for persistence 84 - ``` 85 - 86 - ## Environment Variable Reference 87 - 88 - ### SPINDLE_SERVER_LISTEN_ADDR 89 - **Required**: Yes 90 - **Default**: `0.0.0.0:6555` 91 - **Description**: Address and port for spindle HTTP server 92 - **Example**: `0.0.0.0:6555` 93 - 94 - The spindle HTTP server exposes: 95 - - `GET /events` - WebSocket for status updates 96 - - `GET /logs/{knot}/{rkey}/{name}` - WebSocket for logs 97 - - `POST /xrpc/*` - XRPC endpoints 98 - 99 - ### SPINDLE_SERVER_DB_PATH 100 - **Required**: Yes 101 - **Default**: `spindle.db` 102 - **Description**: Path to SQLite database file 103 - **Example**: `/data/spindle.db` 104 - 105 - **Important**: Use a persistent volume if you want to retain data across pod restarts. 106 - 107 - ### SPINDLE_SERVER_HOSTNAME 108 - **Required**: Yes 109 - **Default**: None 110 - **Description**: Hostname for this spindle instance (used in DID) 111 - **Example**: `loom.example.com` 112 - 113 - This is used to construct the spindle's DID: `did:web:<hostname>` 114 - 115 - ### SPINDLE_SERVER_OWNER 116 - **Required**: Yes 117 - **Default**: None 118 - **Description**: DID of the spindle owner 119 - **Example**: `did:web:example.com` or `did:plc:abc123...` 120 - 121 - The owner has full control over the spindle instance. 122 - 123 - ### SPINDLE_SERVER_JETSTREAM_ENDPOINT 124 - **Required**: Yes 125 - **Default**: `wss://jetstream1.us-west.bsky.network/subscribe` 126 - **Description**: Bluesky jetstream endpoint for ingesting member/repo records 127 - **Example**: `wss://jetstream1.us-west.bsky.network/subscribe` 128 - 129 - Used to watch for: 130 - - `sh.tangled.spindleMember` records 131 - - `sh.tangled.repo` records 132 - - `sh.tangled.repoCollaborator` records 133 - 134 - ### SPINDLE_SERVER_QUEUE_SIZE 135 - **Required**: No 136 - **Default**: `100` 137 - **Description**: Maximum number of jobs that can be queued 138 - **Example**: `100` 139 - 140 - If queue is full, new pipeline events will be rejected. 141 - 142 - ### SPINDLE_SERVER_MAX_JOB_COUNT 143 - **Required**: No 144 - **Default**: `2` 145 - **Description**: Maximum number of workflows running concurrently 146 - **Example**: `5` 147 - 148 - Controls how many Kubernetes Jobs will run in parallel. 149 - 150 - ### SPINDLE_SERVER_LOG_DIR 151 - **Required**: No 152 - **Default**: `/var/log/spindle` 153 - **Description**: Directory for workflow logs (not heavily used in Kubernetes) 154 - **Example**: `/var/log/spindle` 155 - 156 - KubernetesEngine logs are primarily captured via Kubernetes pod logs, not local files. 157 - 158 - ### SPINDLE_SERVER_DEV 159 - **Required**: No 160 - **Default**: `false` 161 - **Description**: Enable development mode 162 - **Example**: `true` 163 - 164 - Development mode may skip some validations or enable verbose logging. 165 - 166 - ### SPINDLE_SERVER_SECRETS_PROVIDER 167 - **Required**: No 168 - **Default**: `sqlite` 169 - **Description**: Secrets storage backend 170 - **Options**: `sqlite`, `openbao` 171 - 172 - #### SQLite Provider (default) 173 - Stores secrets in the SQLite database. 174 - 175 - No additional configuration needed. 176 - 177 - #### OpenBao Provider 178 - Stores secrets in OpenBao vault. 179 - 180 - **Additional environment variables required:** 181 - ```yaml 182 - - name: SPINDLE_SERVER_SECRETS_PROVIDER 183 - value: "openbao" 184 - 185 - - name: SPINDLE_SERVER_SECRETS_OPENBAO_PROXY_ADDR 186 - value: "http://openbao:8200" 187 - 188 - - name: SPINDLE_SERVER_SECRETS_OPENBAO_MOUNT 189 - value: "spindle" # Default mount path 190 - ``` 191 - 192 - ## Security Model 193 - 194 - ### Secrets Management 195 - 196 - Loom integrates with the embedded spindle server's secrets management system: 197 - 198 - **Adding Secrets:** 199 - ```bash 200 - curl -X POST http://loom:6555/xrpc/sh.tangled.repo.addSecret \ 201 - -H "Authorization: Bearer <did-token>" \ 202 - -d '{ 203 - "repo": "at://did:plc:your-did/sh.tangled.repo/your-repo", 204 - "key": "NPM_TOKEN", 205 - "value": "npm_xxxxx" 206 - }' 207 - ``` 208 - 209 - **How Secrets Work:** 210 - 1. Secrets are stored in the vault (SQLite or OpenBao) 211 - 2. When a pipeline runs, Loom retrieves secrets for that repository 212 - 3. A Kubernetes Secret is created per SpindleSet 213 - 4. Job pods receive secrets as environment variables via `envFrom` 214 - 215 - **Important Notes:** 216 - - Secret keys must be valid bash identifiers (`^[a-zA-Z_][a-zA-Z0-9_]*$`) 217 - - Secrets are injected directly (e.g., `NPM_TOKEN`, not `TANGLED_NPM_TOKEN`) 218 - - Cluster operators with filesystem/database access can view secrets 219 - - For operator-blind secrets, configure OpenBao with seal/unseal or cloud KMS 220 - 221 - ### Job Pod Security 222 - 223 - Job pods run with hardened security contexts: 224 - 225 - ```yaml 226 - securityContext: 227 - runAsNonRoot: true 228 - runAsUser: 10000 229 - readOnlyRootFilesystem: true 230 - allowPrivilegeEscalation: false 231 - capabilities: 232 - drop: ["ALL"] 233 - ``` 234 - 235 - **Service Account Isolation:** 236 - - Jobs use `spindle-job-runner` ServiceAccount (not controller SA) 237 - - ServiceAccount token mounting is disabled 238 - - Jobs have zero Kubernetes API permissions 239 - - Prevents jobs from reading other repos' secrets via K8s API 240 - 241 - ## Persistence 242 - 243 - ### SQLite Database 244 - 245 - The SQLite database contains: 246 - - Repos being watched 247 - - Spindle members 248 - - Pipeline events 249 - - Status history 250 - - Secrets (if using sqlite provider) 251 - 252 - **For production, use a PersistentVolume:** 253 - 254 - ```yaml 255 - volumes: 256 - - name: data 257 - persistentVolumeClaim: 258 - claimName: spindle-data 259 - 260 - --- 261 - apiVersion: v1 262 - kind: PersistentVolumeClaim 263 - metadata: 264 - name: spindle-data 265 - namespace: loom-system 266 - spec: 267 - accessModes: 268 - - ReadWriteOnce 269 - resources: 270 - requests: 271 - storage: 10Gi 272 - ``` 273 - 274 - ### Log Directory 275 - 276 - Not critical for Kubernetes deployments since logs are captured via pod logs. 277 - 278 - You can omit the log directory volume or use emptyDir. 279 - 280 - ## Service Configuration 281 - 282 - The controller automatically creates a Kubernetes Service to expose the spindle HTTP server: 283 - 284 - ```yaml 285 - apiVersion: v1 286 - kind: Service 287 - metadata: 288 - name: spindleset-sample-spindle 289 - namespace: default 290 - spec: 291 - selector: 292 - control-plane: controller-manager # Selects Loom operator pod 293 - ports: 294 - - name: http 295 - port: 6555 296 - targetPort: 6555 297 - type: ClusterIP 298 - ``` 299 - 300 - ### External Access 301 - 302 - **Option 1: NodePort** 303 - ```yaml 304 - spec: 305 - type: NodePort 306 - ports: 307 - - name: http 308 - port: 6555 309 - targetPort: 6555 310 - nodePort: 30655 311 - ``` 312 - 313 - **Option 2: LoadBalancer** 314 - ```yaml 315 - spec: 316 - type: LoadBalancer 317 - ports: 318 - - name: http 319 - port: 6555 320 - targetPort: 6555 321 - ``` 322 - 323 - **Option 3: Ingress** 324 - ```yaml 325 - apiVersion: networking.k8s.io/v1 326 - kind: Ingress 327 - metadata: 328 - name: spindle-ingress 329 - namespace: loom-system 330 - spec: 331 - rules: 332 - - host: loom.example.com 333 - http: 334 - paths: 335 - - path: / 336 - pathType: Prefix 337 - backend: 338 - service: 339 - name: spindleset-sample-spindle 340 - port: 341 - number: 6555 342 - ``` 343 - 344 - **Option 4: Port Forward (Testing)** 345 - ```bash 346 - kubectl port-forward -n loom-system svc/spindleset-sample-spindle 6555:6555 347 - ``` 348 - 349 - ## SpindleSet CRD 350 - 351 - The SpindleSet resource configures workflow execution (not spindle server config): 352 - 353 - ```yaml 354 - apiVersion: loom.j5t.io/v1alpha1 355 - kind: SpindleSet 356 - metadata: 357 - name: spindleset-sample 358 - namespace: default 359 - spec: 360 - # URL of the knot this spindle serves (currently not used) 361 - knotUrl: https://knot1.tangled.sh 362 - 363 - # Kubernetes secret containing auth credentials (currently not used) 364 - knotAuthSecret: spindle-auth 365 - 366 - # Maximum concurrent workflows (not enforced yet - uses SPINDLE_SERVER_MAX_JOB_COUNT) 367 - maxConcurrentJobs: 5 368 - 369 - # Template for Kubernetes Jobs 370 - template: 371 - # Resource limits for workflow pods 372 - resources: 373 - requests: 374 - cpu: "500m" 375 - memory: "1Gi" 376 - limits: 377 - cpu: "2" 378 - memory: "4Gi" 379 - 380 - # Node selector for scheduling 381 - nodeSelector: 382 - disktype: ssd 383 - 384 - # Tolerations for taints 385 - tolerations: 386 - - key: workload 387 - operator: Equal 388 - value: ci 389 - effect: NoSchedule 390 - 391 - # Additional affinity rules (merged with architecture affinity) 392 - affinity: 393 - nodeAffinity: 394 - preferredDuringSchedulingIgnoredDuringExecution: 395 - - weight: 100 396 - preference: 397 - matchExpressions: 398 - - key: node-role.kubernetes.io/worker 399 - operator: In 400 - values: [true] 401 - ``` 402 - 403 - **Note**: Currently the SpindleSet is mainly used to trigger Service creation. The spindle server configuration comes from environment variables in the operator Deployment. 404 - 405 - ## Complete Deployment Example 406 - 407 - See `config/manager/manager.yaml` for the full operator deployment manifest. 408 - 409 - Minimal example: 410 - 411 - ```yaml 412 - apiVersion: v1 413 - kind: Namespace 414 - metadata: 415 - name: loom-system 416 - 417 - --- 418 - apiVersion: v1 419 - kind: ServiceAccount 420 - metadata: 421 - name: loom-controller-manager 422 - namespace: loom-system 423 - 424 - --- 425 - # RBAC manifests from config/rbac/... 426 - 427 - --- 428 - apiVersion: apps/v1 429 - kind: Deployment 430 - metadata: 431 - name: loom-controller-manager 432 - namespace: loom-system 433 - spec: 434 - replicas: 1 435 - selector: 436 - matchLabels: 437 - control-plane: controller-manager 438 - template: 439 - metadata: 440 - labels: 441 - control-plane: controller-manager 442 - spec: 443 - serviceAccountName: loom-controller-manager 444 - containers: 445 - - name: manager 446 - image: loom:latest 447 - command: 448 - - /manager 449 - env: 450 - - name: SPINDLE_SERVER_LISTEN_ADDR 451 - value: "0.0.0.0:6555" 452 - - name: SPINDLE_SERVER_DB_PATH 453 - value: "/data/spindle.db" 454 - - name: SPINDLE_SERVER_HOSTNAME 455 - value: "loom.example.com" 456 - - name: SPINDLE_SERVER_OWNER 457 - value: "did:web:example.com" 458 - - name: SPINDLE_SERVER_JETSTREAM_ENDPOINT 459 - value: "wss://jetstream1.us-west.bsky.network/subscribe" 460 - - name: SPINDLE_SERVER_QUEUE_SIZE 461 - value: "100" 462 - - name: SPINDLE_SERVER_MAX_JOB_COUNT 463 - value: "2" 464 - ports: 465 - - containerPort: 6555 466 - name: http 467 - - containerPort: 8081 468 - name: healthz 469 - livenessProbe: 470 - httpGet: 471 - path: /healthz 472 - port: healthz 473 - readinessProbe: 474 - httpGet: 475 - path: /readyz 476 - port: healthz 477 - volumeMounts: 478 - - name: data 479 - mountPath: /data 480 - volumes: 481 - - name: data 482 - emptyDir: {} 483 - 484 - --- 485 - apiVersion: loom.j5t.io/v1alpha1 486 - kind: SpindleSet 487 - metadata: 488 - name: default-spindle 489 - namespace: loom-system 490 - spec: 491 - knotUrl: https://knot1.tangled.sh 492 - maxConcurrentJobs: 5 493 - ``` 494 - 495 - ## Testing Connectivity 496 - 497 - ### 1. Check Service Created 498 - ```bash 499 - kubectl get svc -A | grep spindle 500 - ``` 501 - 502 - Should show: `default-spindle-spindle` 503 - 504 - ### 2. Port Forward to Access Locally 505 - ```bash 506 - kubectl port-forward -n loom-system svc/default-spindle-spindle 6555:6555 507 - ``` 508 - 509 - ### 3. Test /events WebSocket 510 - ```bash 511 - websocat ws://localhost:6555/events 512 - ``` 513 - 514 - Should connect and start streaming events (or stay connected waiting for events). 515 - 516 - ### 4. Check Logs 517 - ```bash 518 - kubectl logs -n loom-system -l control-plane=controller-manager 519 - ``` 520 - 521 - Should show: 522 - - "spindle server initialized successfully" 523 - - "starting spindle HTTP server" 524 - - "Spindle HTTP service created successfully" 525 - 526 - ## Troubleshooting 527 - 528 - ### "spindle server error: address already in use" 529 - Port 6555 is already bound. Check if another process is using it or if you have multiple operator replicas. 530 - 531 - ### "failed to load spindle config: SPINDLE_SERVER_HOSTNAME is required" 532 - Missing required environment variable. Check your Deployment manifest. 533 - 534 - ### Service not created 535 - Check controller logs for errors. Ensure RBAC permissions for Services are granted. 536 - 537 - ### Jobs not creating 538 - - Check spindle can reach knot WebSocket 539 - - Verify EventConsumer is running (logs should show "starting knot event consumer") 540 - - Check if repos are registered in spindle database 541 - 542 - ### Database locked errors 543 - SQLite can't be shared across multiple pods. Set `replicas: 1` or use a different database solution. 544 - 545 - ## Next Steps 546 - 547 - - [Architecture Overview](./ARCHITECTURE.md) - Understand how Loom works 548 - - [Workflow Configuration](./WORKFLOWS.md) - Write workflow YAML files 549 - - [Upstream Improvements](./TANGLED.md) - Contribute to tangled.org/core
+1 -1
Dockerfile
··· 49 49 org.opencontainers.image.authors="Evan Jarrett" \ 50 50 org.opencontainers.image.source="https://tangled.org/evan.jarrett.net/loom" \ 51 51 org.opencontainers.image.documentation="https://tangled.org/evan.jarrett.net/loom" \ 52 - org.opencontainers.image.licenses="MIT" \ 52 + org.opencontainers.image.licenses="Apache-2.0" \ 53 53 org.opencontainers.image.version="latest" 54 54 55 55 ENTRYPOINT ["/manager"]
-416
PLAN.md
··· 1 - # Loom Kubernetes Operator - Implementation Plan 2 - 3 - ## Project Overview 4 - 5 - Loom is a Kubernetes operator for coordinating tangled.org Spindles - ephemeral CI/CD runners that execute pipelines in response to events from tangled.org knots. Inspired by GitHub's Actions Runner Controller (ARC) but adapted for tangled.org's AT Protocol-based, event-driven architecture. 6 - 7 - ## Architecture 8 - 9 - ### Core Principles 10 - - **Ephemeral Spindles**: One Kubernetes Job per pipeline event (scale-to-zero) 11 - - **Event-Driven**: WebSocket connection to tangled.org knot for pipeline events 12 - - **Code Reuse**: Leverage `tangled.org/core/spindle` for WebSocket, models, interfaces 13 - - **Simple Images**: Use standard Docker images (golang:1.24, node:20, etc.) - no Nixery for MVP 14 - - **Multi-Arch Support**: Schedule jobs on amd64/arm64 nodes based on workflow specification 15 - - **New Component**: Kubernetes-native Engine that spawns Jobs instead of Docker containers 16 - 17 - ### Key Components 18 - 19 - 1. **SpindleSet CRD**: Configures connection to tangled.org knot and job templates 20 - 2. **SpindleSet Controller**: Maintains WebSocket connection, handles pipeline events 21 - 3. **KubernetesEngine**: Implements tangled.org's Engine interface for Kubernetes Jobs 22 - 4. **Job Builder**: Generates Job specs with multi-arch node affinity 23 - 5. **Log Streamer**: Streams pod logs to knot via Kubernetes API 24 - 6. **Status Reporter**: Reports workflow status back to tangled.org 25 - 26 - --- 27 - 28 - ## Phase 1: CRD Design & Basic Structure 29 - 30 - ### SpindleSet CRD 31 - ```yaml 32 - apiVersion: loom.j5t.io/v1alpha1 33 - kind: SpindleSet 34 - metadata: 35 - name: tangled-org-spindle 36 - spec: 37 - # Knot configuration 38 - knotUrl: https://tangled.org/@org/repo 39 - knotAuthSecret: spindle-auth # Secret with auth token 40 - 41 - # Scaling configuration 42 - maxConcurrentJobs: 10 43 - 44 - # Default template (can be overridden by workflow) 45 - template: 46 - resources: 47 - requests: 48 - cpu: 500m 49 - memory: 1Gi 50 - limits: 51 - cpu: 2 52 - memory: 4Gi 53 - 54 - # Node targeting defaults 55 - nodeSelector: {} 56 - tolerations: [] 57 - affinity: {} 58 - ``` 59 - 60 - ### Status Fields 61 - - `conditions`: Standard Kubernetes conditions 62 - - `pendingJobs`, `runningJobs`: Current job counts 63 - - `completedJobs`, `failedJobs`: Cumulative counters 64 - - `webSocketConnected`: WebSocket connection status 65 - - `lastEventTime`: Last received event timestamp 66 - 67 - --- 68 - 69 - ## Phase 2: Kubernetes Engine Implementation 70 - 71 - ### Workflow File Format 72 - ```yaml 73 - # In tangled.org repository's .tangled/pipeline.yaml 74 - image: golang:1.24-bookworm 75 - architecture: amd64 # or arm64 76 - 77 - steps: 78 - - name: run tests 79 - command: | 80 - go test -v ./... 81 - 82 - - name: build binary 83 - command: | 84 - go build -o app ./cmd 85 - ``` 86 - 87 - ### Job Pod Structure 88 - - **Init container**: Clone repository from tangled.org 89 - - **Main container**: 90 - - Image: `{workflow.image}` (e.g., `golang:1.24-bookworm`) 91 - - Platform: `linux/{architecture}` 92 - - Execute all steps sequentially 93 - - **Volumes**: 94 - - `/tangled/workspace` - Shared workspace (emptyDir) 95 - - `/tmp/step-outputs` - Step output communication 96 - - `/tmp/github` - GITHUB_ENV-style env passing 97 - - **Node Affinity**: Based on `architecture` field 98 - 99 - ### Multi-Architecture Support 100 - ```go 101 - func (e *KubernetesEngine) buildJobAffinity(arch string) *corev1.Affinity { 102 - return &corev1.Affinity{ 103 - NodeAffinity: &corev1.NodeAffinity{ 104 - RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{ 105 - NodeSelectorTerms: []corev1.NodeSelectorTerm{ 106 - { 107 - MatchExpressions: []corev1.NodeSelectorRequirement{ 108 - { 109 - Key: "kubernetes.io/arch", 110 - Operator: corev1.NodeSelectorOpIn, 111 - Values: []string{arch}, // amd64 or arm64 112 - }, 113 - }, 114 - }, 115 - }, 116 - }, 117 - }, 118 - } 119 - } 120 - ``` 121 - 122 - ### Step Execution Model 123 - Generate bash script that executes all steps sequentially: 124 - - GitHub Actions-compatible environment variables (`GITHUB_ENV`, `GITHUB_OUTPUT`) 125 - - Environment passing between steps 126 - - Error handling and exit on failure 127 - - Step-level logging with timestamps 128 - 129 - --- 130 - 131 - ## Phase 3: WebSocket Integration & Event Handling 132 - 133 - ### WebSocket Client (Reuse from core/spindle) 134 - - Connect to `{knotUrl}/spindle/events` 135 - - Handle cursor-based backfill for missed events 136 - - Subscribe to live `sh.tangled.pipeline` events 137 - - Exponential backoff on connection failures 138 - 139 - ### Event Handler → Job Creation 140 - 1. Parse pipeline event payload 141 - 2. Extract workflow definition, repo, commit SHA 142 - 3. Create Kubernetes Job with: 143 - - Correct architecture node affinity 144 - - Image from workflow spec 145 - - Steps as bash script 146 - - Owner reference to SpindleSet (for cleanup) 147 - 4. Label Job with pipeline metadata 148 - 149 - ### SpindleSet Controller Reconciliation 150 - - Establish WebSocket connection to knot 151 - - Subscribe to pipeline events 152 - - Create Jobs on event received 153 - - Monitor running Jobs 154 - - Update SpindleSet status 155 - - Handle connection failures 156 - 157 - --- 158 - 159 - ## Phase 4: Status Reporting & Observability 160 - 161 - ### Job Status Tracking 162 - Watch Job events via Kubernetes API: 163 - - Job created → Report "running" to knot 164 - - Job succeeded → Report "success" to knot 165 - - Job failed → Report "failure" with error to knot 166 - - Job timeout → Report "timeout" to knot 167 - 168 - ### Status Reporting to Knot 169 - Reuse `spindle/db` status update patterns: 170 - - `StatusRunning()` - When Job starts 171 - - `StatusSuccess()` - When Job succeeds 172 - - `StatusFailed()` - When Job fails with error message 173 - - `StatusTimeout()` - When Job exceeds timeout 174 - 175 - ### Prometheus Metrics 176 - ```go 177 - loom_pending_spindles // Gauge: jobs pending 178 - loom_running_spindles // Gauge: jobs running 179 - loom_completed_spindles_total // Counter: total completed 180 - loom_failed_spindles_total // Counter: total failed 181 - loom_pipeline_duration_seconds // Histogram: execution duration 182 - ``` 183 - 184 - Exposed via controller-runtime's metrics server. 185 - 186 - --- 187 - 188 - ## Phase 5: Log Streaming via Kubernetes API 189 - 190 - ### Implementation 191 - ```go 192 - func (e *KubernetesEngine) StreamLogsToKnot(ctx context.Context, jobName string, knotClient *KnotClient) { 193 - // 1. Get pod for job 194 - // 2. Stream logs via K8s API 195 - // 3. Forward each line to knot in real-time 196 - } 197 - ``` 198 - 199 - ### Log Format 200 - Send to knot in tangled.org spindle format: 201 - ```json 202 - { 203 - "kind": "data", // or "control" 204 - "content": "test output line", 205 - "stepId": 0, 206 - "stepKind": "user" 207 - } 208 - ``` 209 - 210 - --- 211 - 212 - ## Phase 6: Testing & Deployment 213 - 214 - ### Unit Tests 215 - - Job template generation with different architectures 216 - - Node affinity generation (amd64 vs arm64) 217 - - Step script builder 218 - - Mock WebSocket client 219 - 220 - ### Integration Tests 221 - ```go 222 - // Test with real cluster 223 - func TestE2E_SimpleGoPipeline(t *testing.T) { 224 - // 1. Deploy SpindleSet CR 225 - // 2. Send test pipeline event 226 - // 3. Verify Job created on correct arch node 227 - // 4. Wait for completion 228 - // 5. Check logs streamed to knot 229 - } 230 - ``` 231 - 232 - ### Manual Testing 233 - ```bash 234 - # Deploy operator 235 - make deploy IMG=ghcr.io/you/loom:v0.1.0 236 - 237 - # Create SpindleSet 238 - kubectl apply -f config/samples/spindleset_sample.yaml 239 - 240 - # Push code to tangled.org with .tangled/pipeline.yaml 241 - 242 - # Watch Jobs 243 - kubectl get jobs -l loom.j5t.io/spindleset=test-spindle -w 244 - 245 - # Check pod placement 246 - kubectl get pods -o wide 247 - 248 - # View logs 249 - kubectl logs -f job/runner-<hash> 250 - ``` 251 - 252 - --- 253 - 254 - ## File Structure 255 - 256 - ``` 257 - loom/ 258 - ├── api/v1alpha1/ 259 - │ ├── spindleset_types.go # SpindleSet CRD 260 - │ └── groupversion_info.go 261 - 262 - ├── internal/ 263 - │ ├── controller/ 264 - │ │ └── spindleset_controller.go # Main reconciliation loop 265 - │ │ 266 - │ └── engine/ 267 - │ └── kubernetes_engine.go # K8s-native Engine implementation 268 - 269 - ├── pkg/ 270 - │ ├── ingester/ 271 - │ │ └── websocket.go # WebSocket client (adapted from core) 272 - │ │ 273 - │ ├── jobbuilder/ 274 - │ │ ├── job_template.go # Generate Job specs 275 - │ │ ├── affinity.go # Multi-arch node affinity 276 - │ │ └── script_builder.go # Step execution script 277 - │ │ 278 - │ └── knot/ 279 - │ └── client.go # Knot API client for status/logs 280 - 281 - ├── config/ 282 - │ ├── crd/ # Generated CRD manifests 283 - │ ├── rbac/ # RBAC for Job CRUD 284 - │ └── samples/ 285 - │ └── spindleset_sample.yaml 286 - 287 - └── cmd/main.go # Operator entrypoint 288 - ``` 289 - 290 - --- 291 - 292 - ## Dependencies 293 - 294 - ### From tangled.org/core 295 - ```go 296 - import ( 297 - "tangled.org/core/spindle/models" // Engine interface 298 - "tangled.org/core/spindle/config" // Config models 299 - "tangled.org/core/api/tangled" // Pipeline types 300 - // Adapt WebSocket logic from spindle/stream.go, ingester.go 301 - ) 302 - ``` 303 - 304 - ### Kubernetes 305 - ```go 306 - import ( 307 - batchv1 "k8s.io/api/batch/v1" 308 - corev1 "k8s.io/api/core/v1" 309 - "sigs.k8s.io/controller-runtime/pkg/client" 310 - ) 311 - ``` 312 - 313 - ### Metrics 314 - ```go 315 - import ( 316 - "github.com/prometheus/client_golang/prometheus" 317 - "sigs.k8s.io/controller-runtime/pkg/metrics" 318 - ) 319 - ``` 320 - 321 - --- 322 - 323 - ## Implementation Order 324 - 325 - 1. ✅ Create SpindleSet CRD (API types, generate manifests) 326 - 2. ⏳ Implement Job builder (template generation, multi-arch affinity) 327 - 3. ⏳ Implement KubernetesEngine (Engine interface for K8s Jobs) 328 - 4. ⏳ Import WebSocket client (adapt from core/spindle) 329 - 5. ⏳ Implement SpindleSet controller (reconciliation + event handling) 330 - 6. ⏳ Add Job status monitoring (watch Jobs, report to knot) 331 - 7. ⏳ Add log streaming (K8s API → knot) 332 - 8. ⏳ Add Prometheus metrics (instrument controller) 333 - 9. ⏳ Testing (unit + integration tests) 334 - 10. ⏳ Documentation (usage guide, architecture diagrams) 335 - 336 - --- 337 - 338 - ## MVP Scope 339 - 340 - ### Include ✅ 341 - - SpindleSet CRD with knot configuration 342 - - WebSocket connection to knot 343 - - Kubernetes Job creation per pipeline event 344 - - Multi-architecture support (amd64/arm64 node targeting) 345 - - Standard Docker images (golang:1.24, node:20, etc.) 346 - - Sequential step execution in single pod 347 - - Log streaming from K8s pods to knot via K8s API 348 - - Status reporting to knot (success/failure/timeout) 349 - - Prometheus metrics 350 - 351 - ### Exclude (Future Enhancements) ❌ 352 - - Nixery integration (add later) 353 - - Kaniko/Buildah for container builds 354 - - Persistent Nix store caching 355 - - Multi-knot support 356 - - Advanced auto-scaling policies 357 - - Service containers (DB sidecars) 358 - - Matrix builds 359 - 360 - --- 361 - 362 - ## Key Design Decisions 363 - 364 - 1. **Ephemeral Jobs**: Scale-to-zero, one Job per pipeline event 365 - 2. **Simple Images**: Use any Docker Hub image, no Nixery complexity for MVP 366 - 3. **Multi-Arch Native**: Use Kubernetes node affinity for amd64/arm64 targeting 367 - 4. **All steps in one pod**: GitHub Actions model (shared filesystem/env) 368 - 5. **K8s API for logs**: Stream pod logs to knot, no disk-based logging needed 369 - 6. **Reuse spindle models**: Maintain compatibility, adapt only execution layer 370 - 7. **Prometheus metrics**: Standard observability from day one 371 - 372 - --- 373 - 374 - ## Future Enhancements 375 - 376 - ### Phase 7: Nixery Integration 377 - - Detect `dependencies.nixpkgs` in workflow spec 378 - - Generate Nixery image URL dynamically 379 - - Support both standard images and Nixery 380 - - Implement Nix store caching (PVC) 381 - 382 - ### Phase 8: Advanced Features 383 - - ✅ **Buildah integration for container builds** (MVP completed - no caching) 384 - - Service containers (like GitHub Actions services) 385 - - Matrix builds (multiple arch/version combinations) 386 - - Caching strategies (build cache, dependencies) 387 - - Advanced auto-scaling (predictive scaling) 388 - 389 - ### Phase 9: Multi-Tenancy 390 - - Multiple SpindleSets per cluster 391 - - Resource quotas per SpindleSet 392 - - Network policies for isolation 393 - - Multi-knot support (one operator, many knots) 394 - 395 - --- 396 - 397 - ## Success Criteria 398 - 399 - **MVP is complete when:** 400 - 1. SpindleSet CRD can be deployed to cluster 401 - 2. WebSocket connection to tangled.org knot established 402 - 3. Pipeline events trigger Job creation 403 - 4. Jobs execute on correct architecture nodes 404 - 5. Logs stream back to knot in real-time 405 - 6. Status updates sent to knot (success/failure) 406 - 7. Prometheus metrics exposed 407 - 8. Basic integration test passes 408 - 409 - **Production-ready when:** 410 - 1. Full test coverage (unit + integration) 411 - 2. Error handling and retry logic robust 412 - 3. Documentation complete 413 - 4. Helm chart available 414 - 5. Multi-arch container images published 415 - 6. Performance benchmarked 416 - 7. Security review completed
+154 -85
README.md
··· 1 - # loom 2 - // TODO(user): Add simple overview of use/purpose 1 + # Loom 3 2 4 - ## Description 5 - // TODO(user): An in-depth paragraph about your project and overview of use 3 + Loom is a Kubernetes operator that runs CI/CD pipeline workflows from [tangled.org](https://tangled.org). It creates ephemeral Jobs in response to events (pushes, pull requests) and streams logs back to the tangled.org platform. 6 4 7 - ## Getting Started 5 + ## Architecture 8 6 9 - ### Prerequisites 10 - - go version v1.24.0+ 11 - - docker version 17.03+. 12 - - kubectl version v1.11.3+. 13 - - Access to a Kubernetes v1.11.3+ cluster. 7 + ``` 8 + ┌─────────────────────────────────────────────────────────────┐ 9 + │ Loom Operator Pod │ 10 + │ │ 11 + │ ┌────────────────────────────────────────────────────────┐ │ 12 + │ │ Controller Manager │ │ 13 + │ │ - Watches SpindleSet CRD │ │ 14 + │ │ - Creates/monitors Kubernetes Jobs │ │ 15 + │ └────────────────────────────────────────────────────────┘ │ 16 + │ │ 17 + │ ┌────────────────────────────────────────────────────────┐ │ 18 + │ │ Embedded Spindle Server │ │ 19 + │ │ - WebSocket connection to tangled.org knots │ │ 20 + │ │ - Database, queue, secrets vault │ │ 21 + │ │ - KubernetesEngine (creates Jobs) │ │ 22 + │ └────────────────────────────────────────────────────────┘ │ 23 + └─────────────────────────────────────────────────────────────┘ 24 + 25 + │ creates 26 + 27 + ┌───────────────────────────────┐ 28 + │ Kubernetes Job (per workflow) │ 29 + │ │ 30 + │ Init: setup-user, clone-repo │ 31 + │ Main: runner binary + image │ 32 + └───────────────────────────────┘ 33 + ``` 14 34 15 - ### To Deploy on the cluster 16 - **Build and push your image to the location specified by `IMG`:** 35 + ### Components 17 36 18 - ```sh 19 - make docker-build docker-push IMG=<some-registry>/loom:tag 20 - ``` 37 + **Controller (`cmd/controller`)** - The Kubernetes operator that: 38 + - Connects to tangled.org knots via WebSocket to receive pipeline events 39 + - Creates `SpindleSet` custom resources for each pipeline run 40 + - Reconciles SpindleSets into Kubernetes Jobs 41 + - Manages secrets injection and cleanup 21 42 22 - **NOTE:** This image ought to be published in the personal registry you specified. 23 - And it is required to have access to pull the image from the working environment. 24 - Make sure you have the proper permission to the registry if the above commands don’t work. 43 + **Runner (`cmd/runner`)** - A lightweight binary injected into workflow pods that: 44 + - Executes workflow steps sequentially 45 + - Emits structured JSON log events for real-time status updates 46 + - Handles step-level environment variable injection 25 47 26 - **Install the CRDs into the cluster:** 48 + ## How It Works 27 49 28 - ```sh 29 - make install 30 - ``` 50 + 1. A push or PR event triggers a pipeline on tangled.org 51 + 2. Loom receives the event via WebSocket and parses the workflow YAML 52 + 3. A `SpindleSet` CR is created with the pipeline specification 53 + 4. The controller creates a Kubernetes Job with: 54 + - Init containers for user setup and repository cloning 55 + - The runner binary injected via shared volume 56 + - The user's workflow image as the main container 57 + 5. The runner executes steps and streams logs back to the controller 58 + 6. On completion, the SpindleSet and its resources are cleaned up 31 59 32 - **Deploy the Manager to the cluster with the image specified by `IMG`:** 60 + ## Features 33 61 34 - ```sh 35 - make deploy IMG=<some-registry>/loom:tag 62 + - **Multi-architecture support**: Schedule workflows on amd64 or arm64 nodes 63 + - **Rootless container builds**: Buildah support with user namespace configuration 64 + - **Secret management**: Repository secrets injected as environment variables with log masking 65 + - **Resource profiles**: Configure CPU/memory based on node labels 66 + - **Automatic cleanup**: TTL-based Job cleanup and orphan detection 67 + 68 + ## Configuration 69 + 70 + ### Loom ConfigMap 71 + 72 + Loom is configured via a ConfigMap mounted at `/etc/loom/config.yaml`: 73 + 74 + ```yaml 75 + maxConcurrentJobs: 10 76 + template: 77 + resourceProfiles: 78 + - nodeSelector: 79 + kubernetes.io/arch: amd64 80 + resources: 81 + requests: 82 + cpu: "500m" 83 + memory: "1Gi" 84 + limits: 85 + cpu: "2" 86 + memory: "4Gi" 87 + - nodeSelector: 88 + kubernetes.io/arch: arm64 89 + resources: 90 + requests: 91 + cpu: "500m" 92 + memory: "1Gi" 93 + limits: 94 + cpu: "2" 95 + memory: "4Gi" 36 96 ``` 37 97 38 - > **NOTE**: If you encounter RBAC errors, you may need to grant yourself cluster-admin 39 - privileges or be logged in as admin. 98 + ### Spindle Environment Variables 99 + 100 + The embedded spindle server is configured via environment variables: 101 + 102 + | Variable | Required | Description | 103 + |----------|----------|-------------| 104 + | `SPINDLE_SERVER_LISTEN_ADDR` | Yes | HTTP server address (e.g., `0.0.0.0:6555`) | 105 + | `SPINDLE_SERVER_DB_PATH` | Yes | SQLite database path | 106 + | `SPINDLE_SERVER_HOSTNAME` | Yes | Hostname for spindle DID | 107 + | `SPINDLE_SERVER_OWNER` | Yes | Owner DID (e.g., `did:web:example.com`) | 108 + | `SPINDLE_SERVER_JETSTREAM_ENDPOINT` | Yes | Bluesky jetstream WebSocket URL | 109 + | `SPINDLE_SERVER_MAX_JOB_COUNT` | No | Max concurrent workflows (default: 2) | 110 + | `SPINDLE_SERVER_SECRETS_PROVIDER` | No | `sqlite` or `openbao` (default: sqlite) | 40 111 41 - **Create instances of your solution** 42 - You can apply the samples (examples) from the config/sample: 112 + ## Workflow Format 43 113 44 - ```sh 45 - kubectl apply -k config/samples/ 46 - ``` 114 + Workflows are defined in `.tangled/workflows/*.yaml` in your repository: 47 115 48 - >**NOTE**: Ensure that the samples has default values to test it out. 116 + ```yaml 117 + image: golang:1.24 118 + architecture: amd64 49 119 50 - ### To Uninstall 51 - **Delete the instances (CRs) from the cluster:** 120 + steps: 121 + - name: Build 122 + command: go build ./... 52 123 53 - ```sh 54 - kubectl delete -k config/samples/ 124 + - name: Test 125 + command: go test ./... 55 126 ``` 56 127 57 - **Delete the APIs(CRDs) from the cluster:** 128 + ## Security 58 129 59 - ```sh 60 - make uninstall 61 - ``` 130 + ### Job Pod Security 62 131 63 - **UnDeploy the controller from the cluster:** 132 + Jobs run with hardened security contexts: 133 + - Non-root user (UID 1000) 134 + - Minimal capabilities (only SETUID/SETGID for buildah) 135 + - No service account token mounting 136 + - Unconfined seccomp (required for buildah user namespaces) 64 137 65 - ```sh 66 - make undeploy 67 - ``` 138 + ### Secrets 68 139 69 - ## Project Distribution 140 + Repository secrets are: 141 + - Stored in the spindle vault (SQLite or OpenBao) 142 + - Injected as environment variables via Kubernetes Secrets 143 + - Masked in log output 70 144 71 - Following the options to release and provide this solution to the users. 145 + ## Prerequisites 146 + 147 + - go version v1.24.0+ 148 + - docker version 17.03+ 149 + - kubectl version v1.11.3+ 150 + - Access to a Kubernetes v1.11.3+ cluster 72 151 73 - ### By providing a bundle with all YAML files 152 + ## Deployment 74 153 75 - 1. Build the installer for the image built and published in the registry: 154 + Build and push the image: 76 155 77 156 ```sh 78 - make build-installer IMG=<some-registry>/loom:tag 157 + make docker-build docker-push IMG=<registry>/loom:tag 79 158 ``` 80 159 81 - **NOTE:** The makefile target mentioned above generates an 'install.yaml' 82 - file in the dist directory. This file contains all the resources built 83 - with Kustomize, which are necessary to install this project without its 84 - dependencies. 160 + Install the CRDs: 85 161 86 - 2. Using the installer 162 + ```sh 163 + make install 164 + ``` 87 165 88 - Users can just run 'kubectl apply -f <URL for YAML BUNDLE>' to install 89 - the project, i.e.: 166 + Deploy the controller: 90 167 91 168 ```sh 92 - kubectl apply -f https://raw.githubusercontent.com/<org>/loom/<tag or branch>/dist/install.yaml 169 + make deploy IMG=<registry>/loom:tag 93 170 ``` 94 171 95 - ### By providing a Helm Chart 172 + ## Development 96 173 97 - 1. Build the chart using the optional helm plugin 174 + Generate CRDs and code: 98 175 99 176 ```sh 100 - operator-sdk edit --plugins=helm/v1-alpha 177 + make manifests generate 101 178 ``` 102 179 103 - 2. See that a chart was generated under 'dist/chart', and users 104 - can obtain this solution from there. 180 + Run tests: 181 + 182 + ```sh 183 + make test 184 + ``` 105 185 106 - **NOTE:** If you change the project, you need to update the Helm Chart 107 - using the same command above to sync the latest changes. Furthermore, 108 - if you create webhooks, you need to use the above command with 109 - the '--force' flag and manually ensure that any custom configuration 110 - previously added to 'dist/chart/values.yaml' or 'dist/chart/manager/manager.yaml' 111 - is manually re-applied afterwards. 186 + Run locally (for debugging): 112 187 113 - ## Contributing 114 - // TODO(user): Add detailed information on how you would like others to contribute to this project 188 + ```sh 189 + make install run 190 + ``` 115 191 116 - **NOTE:** Run `make help` for more information on all potential `make` targets 192 + ## Uninstall 117 193 118 - More information can be found via the [Kubebuilder Documentation](https://book.kubebuilder.io/introduction.html) 194 + ```sh 195 + kubectl delete -k config/samples/ 196 + make uninstall 197 + make undeploy 198 + ``` 119 199 120 200 ## License 121 201 122 202 Copyright 2025 Evan Jarrett. 123 203 124 - Licensed under the Apache License, Version 2.0 (the "License"); 125 - you may not use this file except in compliance with the License. 126 - You may obtain a copy of the License at 127 - 128 - http://www.apache.org/licenses/LICENSE-2.0 129 - 130 - Unless required by applicable law or agreed to in writing, software 131 - distributed under the License is distributed on an "AS IS" BASIS, 132 - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 133 - See the License for the specific language governing permissions and 134 - limitations under the License. 135 - 204 + Licensed under the Apache License, Version 2.0.
TANGLED.md docs/proposals/TANGLED.md