···11-# Kubernetes Operator for Github Actions Runner
22-Created at 2025-11-03 05:24:53 UTC
33-44-> @ 2025-11-03 05:24:53 UTC
55->
66-> I want to implement a kubernetes operator similar to github actions runner controller. I want to primarily focus on the logic for spawning a “runner” container via jobs and polling logs as it’s processing. I also want to make sure i can build docker containers within these runners. And the jobs should have logic to target nodes based on os architecture. This is supposed to be a rough poc, i already have go logic that works for reading a yaml file and figuring out what workflow to run. It just needs to send that job/step to a job in kubes
77-88-<details><summary>Thinking</summary>
99-1010-The user is asking for help implementing a Kubernetes operator similar to GitHub Actions runner controller. They want to:
1111-1. Spawn runner containers via Jobs
1212-2. Poll logs as it's processing
1313-3. Build docker containers within runners
1414-4. Target nodes based on OS architecture
1515-5. This is a rough POC and they already have Go logic to read YAML and figure out what workflow to run
1616-1717-I'll provide a comprehensive answer covering the implementation of these requirements.
1818-1919-</details>
2020-2121-I'll help you implement a Kubernetes operator for spawning runner containers similar to GitHub Actions Runner Controller. Here's a comprehensive approach for your POC:
2222-2323-## **Kubernetes Job Spawning Logic**
2424-2525-Here's the core Go code structure for creating and managing Jobs:
2626-2727-```go
2828-package operator
2929-3030-import (
3131- "context"
3232- "fmt"
3333- batchv1 "k8s.io/api/batch/v1"
3434- corev1 "k8s.io/api/core/v1"
3535- metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
3636- "k8s.io/client-go/kubernetes"
3737-)
3838-3939-type RunnerController struct {
4040- clientset *kubernetes.Clientset
4141- namespace string
4242-}
4343-4444-func (rc *RunnerController) CreateRunnerJob(workflowID, stepName, stepCommand string, arch string) (*batchv1.Job, error) {
4545- job := &batchv1.Job{
4646- ObjectMeta: metav1.ObjectMeta{
4747- Name: fmt.Sprintf("runner-%s-%s", workflowID, stepName),
4848- Namespace: rc.namespace,
4949- Labels: map[string]string{
5050- "workflow-id": workflowID,
5151- "step-name": stepName,
5252- "type": "runner",
5353- },
5454- },
5555- Spec: batchv1.JobSpec{
5656- Template: corev1.PodTemplateSpec{
5757- ObjectMeta: metav1.ObjectMeta{
5858- Labels: map[string]string{
5959- "workflow-id": workflowID,
6060- "step-name": stepName,
6161- },
6262- },
6363- Spec: corev1.PodSpec{
6464- RestartPolicy: corev1.RestartPolicyNever,
6565- NodeSelector: map[string]string{
6666- "kubernetes.io/arch": arch, // amd64, arm64, etc.
6767- },
6868- Containers: []corev1.Container{
6969- {
7070- Name: "runner",
7171- Image: "docker:dind", // Docker in Docker image
7272- Command: []string{"/bin/sh", "-c"},
7373- Args: []string{stepCommand},
7474- Env: []corev1.EnvVar{
7575- {
7676- Name: "DOCKER_HOST",
7777- Value: "tcp://localhost:2375",
7878- },
7979- },
8080- VolumeMounts: []corev1.VolumeMount{
8181- {
8282- Name: "docker-graph-storage",
8383- MountPath: "/var/lib/docker",
8484- },
8585- },
8686- },
8787- {
8888- Name: "dind-daemon",
8989- Image: "docker:dind",
9090- Command: []string{"dockerd"},
9191- Args: []string{
9292- "--host=tcp://0.0.0.0:2375",
9393- "--storage-driver=overlay2",
9494- },
9595- SecurityContext: &corev1.SecurityContext{
9696- Privileged: &[]bool{true}[0],
9797- },
9898- VolumeMounts: []corev1.VolumeMount{
9999- {
100100- Name: "docker-graph-storage",
101101- MountPath: "/var/lib/docker",
102102- },
103103- },
104104- },
105105- },
106106- Volumes: []corev1.Volume{
107107- {
108108- Name: "docker-graph-storage",
109109- VolumeSource: corev1.VolumeSource{
110110- EmptyDir: &corev1.EmptyDirVolumeSource{},
111111- },
112112- },
113113- },
114114- },
115115- },
116116- },
117117- }
118118-119119- return rc.clientset.BatchV1().Jobs(rc.namespace).Create(context.TODO(), job, metav1.CreateOptions{})
120120-}
121121-```
122122-123123-## **Docker-in-Docker Configuration**
124124-125125-For building Docker containers within your runners, you have two main approaches [^6][^2]:
126126-127127-### **Option 1: Docker-in-Docker (DinD)**
128128-The above example uses a sidecar container approach with the `docker:dind` image. This provides full Docker functionality but requires **privileged mode**.
129129-130130-### **Option 2: Kaniko (Rootless Alternative)**
131131-For better security, consider using Kaniko for building images without Docker daemon:
132132-133133-```go
134134-// Alternative container spec for Kaniko
135135-{
136136- Name: "runner",
137137- Image: "gcr.io/kaniko-project/executor:latest",
138138- Args: []string{
139139- "--dockerfile=Dockerfile",
140140- "--context=git://github.com/your-repo.git",
141141- "--destination=your-registry/image:tag",
142142- },
143143-}
144144-```
145145-146146-## **Log Polling Implementation**
147147-148148-Here's how to poll and stream logs from your runner Jobs [^1][^3]:
149149-150150-```go
151151-import (
152152- "bufio"
153153- "io"
154154- "time"
155155- "k8s.io/client-go/kubernetes"
156156-)
157157-158158-func (rc *RunnerController) PollJobLogs(jobName string) error {
159159- // Wait for pod to be created
160160- time.Sleep(2 * time.Second)
161161-162162- // Get pods for the job
163163- labelSelector := fmt.Sprintf("job-name=%s", jobName)
164164- pods, err := rc.clientset.CoreV1().Pods(rc.namespace).List(
165165- context.TODO(),
166166- metav1.ListOptions{LabelSelector: labelSelector},
167167- )
168168- if err != nil || len(pods.Items) == 0 {
169169- return fmt.Errorf("no pods found for job %s", jobName)
170170- }
171171-172172- podName := pods.Items[0].Name
173173-174174- // Stream logs
175175- req := rc.clientset.CoreV1().Pods(rc.namespace).GetLogs(podName, &corev1.PodLogOptions{
176176- Follow: true,
177177- Container: "runner",
178178- Timestamps: true,
179179- })
180180-181181- stream, err := req.Stream(context.TODO())
182182- if err != nil {
183183- return err
184184- }
185185- defer stream.Close()
186186-187187- // Read and process logs
188188- reader := bufio.NewReader(stream)
189189- for {
190190- line, err := reader.ReadString('\n')
191191- if err != nil {
192192- if err == io.EOF {
193193- break
194194- }
195195- return err
196196- }
197197- // Process log line (send to channel, write to file, etc.)
198198- fmt.Print(line)
199199- }
200200-201201- return nil
202202-}
203203-```
204204-205205-## **Architecture-Based Node Targeting**
206206-207207-To target specific node architectures, use **nodeSelector** or **nodeAffinity** [^1]:
208208-209209-```go
210210-// Enhanced node selection with affinity
211211-Spec: corev1.PodSpec{
212212- Affinity: &corev1.Affinity{
213213- NodeAffinity: &corev1.NodeAffinity{
214214- RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{
215215- NodeSelectorTerms: []corev1.NodeSelectorTerm{
216216- {
217217- MatchExpressions: []corev1.NodeSelectorRequirement{
218218- {
219219- Key: "kubernetes.io/arch",
220220- Operator: corev1.NodeSelectorOpIn,
221221- Values: []string{arch}, // "amd64", "arm64"
222222- },
223223- {
224224- Key: "kubernetes.io/os",
225225- Operator: corev1.NodeSelectorOpIn,
226226- Values: []string{"linux"},
227227- },
228228- },
229229- },
230230- },
231231- },
232232- },
233233- },
234234- // ... rest of pod spec
235235-}
236236-```
237237-238238-## **Job Monitoring and Cleanup**
239239-240240-Add logic to monitor job completion and cleanup:
241241-242242-```go
243243-func (rc *RunnerController) WaitForJobCompletion(jobName string, timeout time.Duration) (*batchv1.Job, error) {
244244- ctx, cancel := context.WithTimeout(context.Background(), timeout)
245245- defer cancel()
246246-247247- ticker := time.NewTicker(5 * time.Second)
248248- defer ticker.Stop()
249249-250250- for {
251251- select {
252252- case <-ctx.Done():
253253- return nil, fmt.Errorf("job %s timed out", jobName)
254254- case <-ticker.C:
255255- job, err := rc.clientset.BatchV1().Jobs(rc.namespace).Get(
256256- context.TODO(), jobName, metav1.GetOptions{},
257257- )
258258- if err != nil {
259259- return nil, err
260260- }
261261-262262- if job.Status.Succeeded > 0 {
263263- return job, nil
264264- }
265265- if job.Status.Failed > 0 {
266266- return job, fmt.Errorf("job %s failed", jobName)
267267- }
268268- }
269269- }
270270-}
271271-272272-func (rc *RunnerController) CleanupJob(jobName string) error {
273273- propagationPolicy := metav1.DeletePropagationBackground
274274- return rc.clientset.BatchV1().Jobs(rc.namespace).Delete(
275275- context.TODO(),
276276- jobName,
277277- metav1.DeleteOptions{
278278- PropagationPolicy: &propagationPolicy,
279279- },
280280- )
281281-}
282282-```
283283-284284-## **Integration with Your Workflow Parser**
285285-286286-Here's how to tie it all together with your existing YAML parser:
287287-288288-```go
289289-type WorkflowExecutor struct {
290290- controller *RunnerController
291291-}
292292-293293-func (we *WorkflowExecutor) ExecuteWorkflow(workflow *WorkflowSpec) error {
294294- for _, step := range workflow.Steps {
295295- // Create job for step
296296- job, err := we.controller.CreateRunnerJob(
297297- workflow.ID,
298298- step.Name,
299299- step.Command,
300300- step.Architecture, // from your YAML
301301- )
302302- if err != nil {
303303- return err
304304- }
305305-306306- // Start log polling in goroutine
307307- go we.controller.PollJobLogs(job.Name)
308308-309309- // Wait for completion
310310- completedJob, err := we.controller.WaitForJobCompletion(job.Name, 30*time.Minute)
311311- if err != nil {
312312- return err
313313- }
314314-315315- // Optional: cleanup
316316- if workflow.CleanupOnComplete {
317317- we.controller.CleanupJob(job.Name)
318318- }
319319- }
320320- return nil
321321-}
322322-```
323323-324324-## **Key Considerations**
325325-326326-- **Security**: Running Docker-in-Docker requires privileged containers. Consider using Kaniko or Buildah for rootless builds
327327-- **Resource Management**: Set appropriate resource limits on Jobs to prevent resource exhaustion
328328-- **Persistent Storage**: For caching Docker layers between builds, consider using PersistentVolumes instead of EmptyDir
329329-- **RBAC**: Ensure your operator has proper ServiceAccount permissions to create/delete Jobs and read logs
330330-331331-This POC gives you a solid foundation similar to the GitHub Actions Runner Controller [^4][^5], with the core functionality you need for spawning runners, building containers, and monitoring execution.
332332-333333-334334-_References_:
335335-[^1]: [Jobs | Kubernetes](https://kubernetes.io/docs/concepts/workloads/controllers/job/)
336336-[^2]: [Jenkins: running workers in Kubernetes and Docker images build](https://dev.to/setevoy/jenkins-running-workers-in-kubernetes-and-docker-images-build-1lef)
337337-[^3]: [kubectl logs - Kubernetes](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_logs/)
338338-[^4]: [Kubernetes controller for GitHub Actions self-hosted runners](https://github.com/actions/actions-runner-controller)
339339-[^5]: [GitHub - evryfs/github-actions-runner-operator: K8S operator for...](https://github.com/evryfs/github-actions-runner-operator)
340340-[^6]: [Automating Docker Builds in Kubernetes | by Sebastien Goasguen](https://sebgoa.medium.com/automating-docker-builds-in-kubernetes-15ef047795ab)
341341-342342-> @ 2025-11-03 05:37:24 UTC
343343->
344344-> Yeah i think i want to support buildah primarily. Not to say we can’t have ways to run alternative images later. Why would i want to choose node affinity over nodeselector?
345345-346346-<details><summary>Thinking</summary>
347347-348348-The user is asking about why they would choose node affinity over nodeSelector, and they want to use Buildah primarily for container builds. Let me provide a comprehensive answer about the differences between nodeSelector and nodeAffinity, and then update the code examples to use Buildah instead of Docker-in-Docker.
349349-350350-Based on the research provided:
351351-1. NodeSelector is simpler but less flexible
352352-2. NodeAffinity provides more advanced scheduling options
353353-3. Buildah can run rootless, which is better for security
354354-355355-Let me structure a response that addresses both points clearly.
356356-357357-</details>
358358-359359-Great choice on **Buildah**! It's much more secure since it can run rootless without requiring privileged containers. Let me explain the nodeAffinity vs nodeSelector decision and provide updated code for Buildah.
360360-361361-## **NodeSelector vs NodeAffinity: When to Choose Which**
362362-363363-### **NodeSelector**
364364-- **Simple equality-based selection** [^4][^2]
365365-- Only supports "AND" logic between labels
366366-- More concise for basic requirements
367367-- Example: `kubernetes.io/arch: amd64`
368368-369369-### **NodeAffinity**
370370-NodeAffinity offers **significantly more flexibility** that you'll likely want for a runner controller [^2][^1]:
371371-372372-1. **Complex expressions**: Use operators like `In`, `NotIn`, `Exists`, `DoesNotExist`, `Gt`, `Lt`
373373-2. **Soft preferences**: Use `preferredDuringSchedulingIgnoredDuringExecution` for optional requirements
374374-3. **Multiple criteria**: Combine multiple conditions with OR logic
375375-4. **Weighted preferences**: Assign weights to preferred nodes
376376-377377-For your runner controller, **nodeAffinity is better** because:
378378-- You can handle fallback scenarios (prefer arm64 but accept amd64)
379379-- You can exclude certain nodes (avoid nodes with specific taints)
380380-- You can combine architecture with other requirements (GPU nodes, specific OS versions)
381381-382382-## **Buildah-Based Runner Implementation**
383383-384384-Here's the updated implementation using Buildah:
385385-386386-```go
387387-package operator
388388-389389-import (
390390- "context"
391391- "fmt"
392392- batchv1 "k8s.io/api/batch/v1"
393393- corev1 "k8s.io/api/core/v1"
394394- metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
395395- "k8s.io/client-go/kubernetes"
396396-)
397397-398398-type BuildahRunnerConfig struct {
399399- Registry string
400400- RegistryAuth string // base64 encoded auth
401401- StorageDriver string // overlay, vfs
402402- Architecture string // amd64, arm64
403403-}
404404-405405-func (rc *RunnerController) CreateBuildahRunnerJob(
406406- workflowID, stepName string,
407407- buildContext string,
408408- config BuildahRunnerConfig,
409409-) (*batchv1.Job, error) {
410410-411411- job := &batchv1.Job{
412412- ObjectMeta: metav1.ObjectMeta{
413413- Name: fmt.Sprintf("runner-%s-%s", workflowID, stepName),
414414- Namespace: rc.namespace,
415415- Labels: map[string]string{
416416- "workflow-id": workflowID,
417417- "step-name": stepName,
418418- "builder": "buildah",
419419- },
420420- },
421421- Spec: batchv1.JobSpec{
422422- BackoffLimit: &[]int32{2}[0],
423423- Template: corev1.PodTemplateSpec{
424424- ObjectMeta: metav1.ObjectMeta{
425425- Labels: map[string]string{
426426- "workflow-id": workflowID,
427427- "step-name": stepName,
428428- },
429429- Annotations: map[string]string{
430430- "container.apparmor.security.beta.kubernetes.io/buildah": "unconfined",
431431- },
432432- },
433433- Spec: corev1.PodSpec{
434434- RestartPolicy: corev1.RestartPolicyNever,
435435-436436- // Using NodeAffinity for flexible architecture targeting
437437- Affinity: &corev1.Affinity{
438438- NodeAffinity: &corev1.NodeAffinity{
439439- RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{
440440- NodeSelectorTerms: []corev1.NodeSelectorTerm{
441441- {
442442- MatchExpressions: []corev1.NodeSelectorRequirement{
443443- {
444444- Key: "kubernetes.io/arch",
445445- Operator: corev1.NodeSelectorOpIn,
446446- Values: []string{config.Architecture},
447447- },
448448- },
449449- },
450450- },
451451- },
452452- // Optional: Prefer nodes with more resources
453453- PreferredDuringSchedulingIgnoredDuringExecution: []corev1.PreferredSchedulingTerm{
454454- {
455455- Weight: 100,
456456- Preference: corev1.NodeSelectorTerm{
457457- MatchExpressions: []corev1.NodeSelectorRequirement{
458458- {
459459- Key: "node-role.kubernetes.io/runner",
460460- Operator: corev1.NodeSelectorOpExists,
461461- },
462462- },
463463- },
464464- },
465465- },
466466- },
467467- },
468468-469469- Containers: []corev1.Container{
470470- {
471471- Name: "buildah",
472472- Image: "quay.io/buildah/stable:latest",
473473- Command: []string{"/bin/bash", "-c"},
474474- Args: []string{fmt.Sprintf(`
475475- set -e
476476- echo "Starting build process..."
477477-478478- # Clone or prepare build context
479479- %s
480480-481481- # Build the image
482482- buildah bud \
483483- --storage-driver=%s \
484484- --format=docker \
485485- --tls-verify=false \
486486- -t %s/%s:%s \
487487- -f Dockerfile .
488488-489489- # Push to registry (if configured)
490490- if [ -n "$REGISTRY_AUTH" ]; then
491491- echo "$REGISTRY_AUTH" | base64 -d > /tmp/auth.json
492492- buildah push \
493493- --authfile /tmp/auth.json \
494494- --tls-verify=false \
495495- %s/%s:%s
496496- rm /tmp/auth.json
497497- fi
498498-499499- echo "Build completed successfully"
500500- `,
501501- buildContext,
502502- config.StorageDriver,
503503- config.Registry, workflowID, stepName,
504504- config.Registry, workflowID, stepName,
505505- )},
506506-507507- SecurityContext: &corev1.SecurityContext{
508508- // Buildah can run rootless!
509509- RunAsUser: &[]int64{1000}[0],
510510- RunAsGroup: &[]int64{1000}[0],
511511- Capabilities: &corev1.Capabilities{
512512- Add: []corev1.Capability{
513513- "SETUID",
514514- "SETGID",
515515- },
516516- },
517517- },
518518-519519- Env: []corev1.EnvVar{
520520- {
521521- Name: "STORAGE_DRIVER",
522522- Value: config.StorageDriver,
523523- },
524524- {
525525- Name: "REGISTRY_AUTH",
526526- Value: config.RegistryAuth,
527527- },
528528- {
529529- Name: "BUILDAH_ISOLATION",
530530- Value: "chroot",
531531- },
532532- },
533533-534534- Resources: corev1.ResourceRequirements{
535535- Requests: corev1.ResourceList{
536536- corev1.ResourceCPU: resource.MustParse("500m"),
537537- corev1.ResourceMemory: resource.MustParse("1Gi"),
538538- },
539539- Limits: corev1.ResourceList{
540540- corev1.ResourceCPU: resource.MustParse("2"),
541541- corev1.ResourceMemory: resource.MustParse("4Gi"),
542542- },
543543- },
544544-545545- VolumeMounts: []corev1.VolumeMount{
546546- {
547547- Name: "buildah-storage",
548548- MountPath: "/var/lib/containers",
549549- },
550550- },
551551- },
552552- },
553553-554554- Volumes: []corev1.Volume{
555555- {
556556- Name: "buildah-storage",
557557- VolumeSource: corev1.VolumeSource{
558558- EmptyDir: &corev1.EmptyDirVolumeSource{
559559- Medium: corev1.StorageMediumMemory, // Use RAM for speed
560560- },
561561- },
562562- },
563563- },
564564- },
565565- },
566566- },
567567- }
568568-569569- return rc.clientset.BatchV1().Jobs(rc.namespace).Create(context.TODO(), job, metav1.CreateOptions{})
570570-}
571571-```
572572-573573-## **Supporting Multiple Build Tools with Strategy Pattern**
574574-575575-Since you mentioned supporting alternative images later, here's an extensible approach:
576576-577577-```go
578578-type BuilderStrategy interface {
579579- CreateJob(workflowID, stepName string, config BuildConfig) *batchv1.Job
580580- GetImage() string
581581- GetSecurityContext() *corev1.SecurityContext
582582-}
583583-584584-type BuildahStrategy struct{}
585585-type KanikoStrategy struct{}
586586-type DockerStrategy struct{}
587587-588588-func (b *BuildahStrategy) GetSecurityContext() *corev1.SecurityContext {
589589- return &corev1.SecurityContext{
590590- RunAsUser: &[]int64{1000}[0],
591591- RunAsGroup: &[]int64{1000}[0],
592592- Capabilities: &corev1.Capabilities{
593593- Add: []corev1.Capability{"SETUID", "SETGID"},
594594- },
595595- }
596596-}
597597-598598-func (k *KanikoStrategy) GetSecurityContext() *corev1.SecurityContext {
599599- return &corev1.SecurityContext{
600600- RunAsUser: &[]int64{0}[0], // Kaniko needs root but doesn't need privileged
601601- }
602602-}
603603-604604-// Factory pattern for builder selection
605605-func GetBuilderStrategy(builderType string) BuilderStrategy {
606606- switch builderType {
607607- case "kaniko":
608608- return &KanikoStrategy{}
609609- case "docker":
610610- return &DockerStrategy{}
611611- default:
612612- return &BuildahStrategy{} // Default to Buildah
613613- }
614614-}
615615-```
616616-617617-## **Advanced NodeAffinity Example**
618618-619619-Here's a more sophisticated nodeAffinity setup that showcases why it's superior for complex requirements:
620620-621621-```go
622622-Affinity: &corev1.Affinity{
623623- NodeAffinity: &corev1.NodeAffinity{
624624- RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{
625625- NodeSelectorTerms: []corev1.NodeSelectorTerm{
626626- {
627627- MatchExpressions: []corev1.NodeSelectorRequirement{
628628- {
629629- Key: "kubernetes.io/arch",
630630- Operator: corev1.NodeSelectorOpIn,
631631- Values: []string{"amd64", "arm64"}, // Either architecture
632632- },
633633- {
634634- Key: "node.kubernetes.io/instance-type",
635635- Operator: corev1.NodeSelectorOpNotIn,
636636- Values: []string{"t2.micro", "t2.small"}, // Exclude small instances
637637- },
638638- },
639639- },
640640- },
641641- },
642642- PreferredDuringSchedulingIgnoredDuringExecution: []corev1.PreferredSchedulingTerm{
643643- {
644644- Weight: 100,
645645- Preference: corev1.NodeSelectorTerm{
646646- MatchExpressions: []corev1.NodeSelectorRequirement{
647647- {
648648- Key: "kubernetes.io/arch",
649649- Operator: corev1.NodeSelectorOpIn,
650650- Values: []string{config.PreferredArch}, // Prefer specific arch
651651- },
652652- },
653653- },
654654- },
655655- {
656656- Weight: 50,
657657- Preference: corev1.NodeSelectorTerm{
658658- MatchExpressions: []corev1.NodeSelectorRequirement{
659659- {
660660- Key: "workload-type",
661661- Operator: corev1.NodeSelectorOpIn,
662662- Values: []string{"build-optimized"},
663663- },
664664- },
665665- },
666666- },
667667- },
668668- },
669669-}
670670-```
671671-672672-**Buildah** is an excellent choice for your runner controller because it:
673673-- Runs without root privileges (better security) [^5][^3]
674674-- Doesn't require a daemon
675675-- Has lower resource overhead
676676-- Supports multiple storage drivers
677677-- Works well in Kubernetes environments
678678-679679-The combination of Buildah with nodeAffinity gives you a secure, flexible runner system that can intelligently place workloads across your cluster.
680680-681681-682682-_References_:
683683-[^1]: [Mastering Node Placement in Kubernetes: Node Selectors vs. Node Affinity](https://medium.com/@rahulbansod519/mastering-node-placement-in-kubernetes-node-selectors-vs-node-affinity-f6a24cda0ddd)
684684-[^2]: [Node Selector vs Node Affinity in Kubernetes - Medium](https://medium.com/@reach2shristi.81/node-selector-vs-node-affinity-in-kubernetes-3292e9552aa0)
685685-[^3]: [Buildah and Podman Relationship | buildah.io](https://buildah.io/blogs/2018/10/31/podman-buildah-relationship.html)
686686-[^4]: [Kubernetes: Node Selector and Node Affinity | by Claire Lee | Medium](https://yuminlee2.medium.com/kubernetes-node-selector-and-node-affinity-ecb3a4d69165)
687687-[^5]: [Building a Buildah container image for Kubernetes](https://buildah.io/blogs/2018/03/01/building-buildah-container-image-for-kubernetes.html)
688688-689689-> @ 2025-11-03 05:43:40 UTC
690690->
691691-> How does GitHub handle isolation? I believe each step in a job is the same runner, but different jobs are isolated? This code appears to make a single step into a job? Is there shared context between steps? Ie environment variables, cached files etc?
692692-693693-<details><summary>Thinking</summary>
694694-695695-The user is asking about GitHub's isolation model and noticing that my code creates a separate Kubernetes Job for each step, which is different from how GitHub Actions works. They're right - GitHub Actions runs all steps in a job on the same runner with shared context, while different jobs are isolated. Let me explain this architecture and provide updated code that better mimics GitHub's model.
696696-697697-</details>
698698-699699-Excellent observation! You're absolutely right - **GitHub Actions runs all steps within a job on the same runner**, sharing context between steps. Your current code creates separate Kubernetes Jobs for each step, which breaks this model. Let me explain GitHub's architecture and provide a better implementation.
700700-701701-## **GitHub Actions Isolation Model**
702702-703703-### **How GitHub Actually Works:**
704704-- **Job Level**: Each job gets its own runner (isolated environment) [^1]
705705-- **Step Level**: All steps within a job execute sequentially on the same runner [^1]
706706-- **Shared Context**: Steps share:
707707- - Filesystem/workspace
708708- - Environment variables
709709- - Docker volumes [^4]
710710- - Network context
711711- - Installed tools/dependencies
712712-713713-### **Isolation Boundaries:**
714714-- **Between Jobs**: Complete isolation (different runners/pods)
715715-- **Between Steps**: No isolation (same container/environment)
716716-- **Between Workflows**: Complete isolation
717717-718718-## **Corrected Implementation: Job = Pod, Steps = Sequential Commands**
719719-720720-Here's a redesigned approach that properly mimics GitHub's model:
721721-722722-```go
723723-package operator
724724-725725-import (
726726- "context"
727727- "encoding/json"
728728- "fmt"
729729- "strings"
730730- batchv1 "k8s.io/api/batch/v1"
731731- corev1 "k8s.io/api/core/v1"
732732- metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
733733-)
734734-735735-type WorkflowJob struct {
736736- ID string
737737- Name string
738738- Steps []Step
739739- Architecture string
740740- Environment map[string]string
741741-}
742742-743743-type Step struct {
744744- Name string
745745- Command string
746746- Env map[string]string
747747- Uses string // For action references
748748-}
749749-750750-func (rc *RunnerController) CreateJobRunner(job WorkflowJob) (*batchv1.Job, error) {
751751- // Build a single script that executes all steps sequentially
752752- runnerScript := rc.buildRunnerScript(job.Steps, job.Environment)
753753-754754- k8sJob := &batchv1.Job{
755755- ObjectMeta: metav1.ObjectMeta{
756756- Name: fmt.Sprintf("runner-job-%s", job.ID),
757757- Namespace: rc.namespace,
758758- Labels: map[string]string{
759759- "workflow-id": job.ID,
760760- "job-name": job.Name,
761761- "type": "github-runner",
762762- },
763763- },
764764- Spec: batchv1.JobSpec{
765765- BackoffLimit: &[]int32{0}[0], // No retries for workflow jobs
766766- Template: corev1.PodTemplateSpec{
767767- Spec: corev1.PodSpec{
768768- RestartPolicy: corev1.RestartPolicyNever,
769769-770770- // Single init container to set up workspace
771771- InitContainers: []corev1.Container{
772772- {
773773- Name: "setup-workspace",
774774- Image: "alpine/git:latest",
775775- Command: []string{"/bin/sh", "-c"},
776776- Args: []string{`
777777- # Clone repository or set up workspace
778778- git clone ${REPO_URL} /workspace || true
779779- cd /workspace
780780- git checkout ${GITHUB_SHA:-main}
781781- `},
782782- VolumeMounts: []corev1.VolumeMount{
783783- {
784784- Name: "workspace",
785785- MountPath: "/workspace",
786786- },
787787- },
788788- },
789789- },
790790-791791- // Main runner container executes ALL steps
792792- Containers: []corev1.Container{
793793- {
794794- Name: "runner",
795795- Image: "quay.io/buildah/stable:latest",
796796- Command: []string{"/bin/bash", "-c"},
797797- Args: []string{runnerScript},
798798-799799- WorkingDir: "/workspace",
800800-801801- Env: rc.buildEnvironment(job),
802802-803803- VolumeMounts: []corev1.VolumeMount{
804804- {
805805- Name: "workspace",
806806- MountPath: "/workspace",
807807- },
808808- {
809809- Name: "step-outputs",
810810- MountPath: "/tmp/step-outputs",
811811- },
812812- {
813813- Name: "github-env",
814814- MountPath: "/tmp/github",
815815- },
816816- {
817817- Name: "buildah-storage",
818818- MountPath: "/var/lib/containers",
819819- },
820820- },
821821-822822- SecurityContext: &corev1.SecurityContext{
823823- RunAsUser: &[]int64{1000}[0],
824824- RunAsGroup: &[]int64{1000}[0],
825825- },
826826- },
827827- },
828828-829829- Volumes: []corev1.Volume{
830830- {
831831- Name: "workspace",
832832- VolumeSource: corev1.VolumeSource{
833833- EmptyDir: &corev1.EmptyDirVolumeSource{},
834834- },
835835- },
836836- {
837837- Name: "step-outputs",
838838- VolumeSource: corev1.VolumeSource{
839839- EmptyDir: &corev1.EmptyDirVolumeSource{},
840840- },
841841- },
842842- {
843843- Name: "github-env",
844844- VolumeSource: corev1.VolumeSource{
845845- EmptyDir: &corev1.EmptyDirVolumeSource{},
846846- },
847847- },
848848- {
849849- Name: "buildah-storage",
850850- VolumeSource: corev1.VolumeSource{
851851- EmptyDir: &corev1.EmptyDirVolumeSource{
852852- Medium: corev1.StorageMediumMemory,
853853- },
854854- },
855855- },
856856- },
857857-858858- NodeSelector: map[string]string{
859859- "kubernetes.io/arch": job.Architecture,
860860- },
861861- },
862862- },
863863- },
864864- }
865865-866866- return rc.clientset.BatchV1().Jobs(rc.namespace).Create(context.TODO(), k8sJob, metav1.CreateOptions{})
867867-}
868868-```
869869-870870-## **Step Execution Script Builder**
871871-872872-This critical function creates a bash script that maintains context between steps:
873873-874874-```go
875875-func (rc *RunnerController) buildRunnerScript(steps []Step, jobEnv map[string]string) string {
876876- var script strings.Builder
877877-878878- script.WriteString(`#!/bin/bash
879879-set -e
880880-881881-# GitHub Actions compatible environment setup
882882-export GITHUB_ENV=/tmp/github/env
883883-export GITHUB_PATH=/tmp/github/path
884884-export GITHUB_OUTPUT=/tmp/github/output
885885-export GITHUB_STEP_SUMMARY=/tmp/github/step-summary.md
886886-887887-# Initialize files
888888-touch $GITHUB_ENV $GITHUB_PATH $GITHUB_OUTPUT $GITHUB_STEP_SUMMARY
889889-890890-# Helper functions matching GitHub Actions
891891-add_to_env() {
892892- echo "$1=$2" >> $GITHUB_ENV
893893- export $1="$2"
894894-}
895895-896896-add_to_path() {
897897- echo "$1" >> $GITHUB_PATH
898898- export PATH="$1:$PATH"
899899-}
900900-901901-set_output() {
902902- echo "$1=$2" >> $GITHUB_OUTPUT
903903-}
904904-905905-# Function to load environment between steps
906906-load_github_env() {
907907- if [ -f "$GITHUB_ENV" ]; then
908908- set -a
909909- source $GITHUB_ENV
910910- set +a
911911- fi
912912- if [ -f "$GITHUB_PATH" ]; then
913913- while IFS= read -r line; do
914914- export PATH="$line:$PATH"
915915- done < $GITHUB_PATH
916916- fi
917917-}
918918-919919-`)
920920-921921- // Add each step
922922- for i, step := range steps {
923923- script.WriteString(fmt.Sprintf(`
924924-echo "==== Starting Step: %s ===="
925925-load_github_env
926926-927927-# Step %d environment variables
928928-`, step.Name, i+1))
929929-930930- // Add step-specific environment variables
931931- for k, v := range step.Env {
932932- script.WriteString(fmt.Sprintf("export %s='%s'\n", k, v))
933933- }
934934-935935- // Execute step command
936936- script.WriteString(fmt.Sprintf(`
937937-# Execute step
938938-(
939939- %s
940940-) || {
941941- echo "Step '%s' failed with exit code $?"
942942- exit 1
943943-}
944944-945945-echo "==== Completed Step: %s ===="
946946-`, step.Command, step.Name, step.Name))
947947- }
948948-949949- return script.String()
950950-}
951951-```
952952-953953-## **Shared Context Implementation**
954954-955955-Here's how to properly share context between steps like GitHub Actions does [^2][^3]:
956956-957957-```go
958958-type StepOutput struct {
959959- StepName string
960960- Outputs map[string]string
961961-}
962962-963963-// Enhanced step structure with outputs
964964-type EnhancedStep struct {
965965- Step
966966- ID string // For referencing outputs
967967- Outputs []string // Expected output keys
968968-}
969969-970970-func (rc *RunnerController) buildEnhancedRunnerScript(steps []EnhancedStep) string {
971971- var script strings.Builder
972972-973973- // ... (previous setup code) ...
974974-975975- for i, step := range steps {
976976- script.WriteString(fmt.Sprintf(`
977977-# Step: %s (id: %s)
978978-STEP_OUTPUT_FILE="/tmp/step-outputs/%s.json"
979979-980980-# Run step and capture outputs
981981-(
982982- %s
983983-984984- # Save step outputs if any were set
985985- if [ -f "$GITHUB_OUTPUT" ]; then
986986- # Parse and save outputs for this step
987987- python3 -c "
988988-import json
989989-outputs = {}
990990-with open('$GITHUB_OUTPUT', 'r') as f:
991991- for line in f:
992992- if '=' in line:
993993- key, value = line.strip().split('=', 1)
994994- outputs[key] = value
995995-with open('$STEP_OUTPUT_FILE', 'w') as f:
996996- json.dump(outputs, f)
997997-"
998998- fi
999999-)
10001000-10011001-# Make outputs available as environment variables for next steps
10021002-if [ -f "$STEP_OUTPUT_FILE" ]; then
10031003- export STEP_%s_OUTPUTS=$(cat $STEP_OUTPUT_FILE)
10041004-fi
10051005-`, step.Name, step.ID, step.ID, step.Command, strings.ToUpper(step.ID)))
10061006- }
10071007-10081008- return script.String()
10091009-}
10101010-```
10111011-10121012-## **Multi-Container Pattern for Complex Workflows**
10131013-10141014-For workflows that need services (like databases), use a multi-container pod:
10151015-10161016-```go
10171017-// For jobs that need service containers (like GitHub Actions services)
10181018-func (rc *RunnerController) CreateJobWithServices(job WorkflowJob, services []ServiceContainer) (*batchv1.Job, error) {
10191019- containers := []corev1.Container{
10201020- // Main runner container
10211021- {
10221022- Name: "runner",
10231023- // ... (as before)
10241024- },
10251025- }
10261026-10271027- // Add service containers that run alongside the main runner
10281028- for _, svc := range services {
10291029- containers = append(containers, corev1.Container{
10301030- Name: svc.Name,
10311031- Image: svc.Image,
10321032- Env: convertToEnvVar(svc.Environment),
10331033- Ports: svc.Ports,
10341034- // Services run for the lifetime of the job
10351035- Command: svc.Command,
10361036- })
10371037- }
10381038-10391039- // Update pod spec with all containers
10401040- // ...
10411041-}
10421042-```
10431043-10441044-## **Key Differences from Original Implementation**
10451045-10461046-| Aspect | GitHub Actions Model | Your Original Code | Corrected Approach |
10471047-|--------|---------------------|-------------------|-------------------|
10481048-| **Job Mapping** | Job = Runner Instance | Step = K8s Job | Job = K8s Job/Pod |
10491049-| **Step Execution** | Sequential in same environment | Separate pods | Sequential in same container |
10501050-| **Filesystem** | Shared across steps | Isolated per step | Shared via volumes |
10511051-| **Environment Variables** | Persistent across steps | Isolated | Persistent via GITHUB_ENV |
10521052-| **Docker/Buildah Cache** | Shared in job | Lost between steps | Shared via volume |
10531053-10541054-This corrected approach properly mimics GitHub Actions' execution model while maintaining the isolation boundaries that make sense in Kubernetes.
10551055-10561056-10571057-_References_:
10581058-[^1]: [Understanding GitHub Actions](https://docs.github.com/articles/getting-started-with-github-actions)
10591059-[^2]: [Store information in variables - GitHub Docs](https://docs.github.com/en/actions/how-tos/write-workflows/choose-what-workflows-do/use-variables)
10601060-[^3]: [Contexts - GitHub Docs](https://docs.github.com/en/actions/concepts/workflows-and-actions/contexts)
10611061-[^4]: [Workflow syntax for GitHub Actions](https://docs.github.com/actions/using-workflows/workflow-syntax-for-github-actions)
10621062-10631063-10641064----
10651065-10661066-_Generated by [Kagi Assistant](https://kagi.com/assistant)_
+9
api/v1alpha1/spindleset_types.go
···5757 // +kubebuilder:validation:Required
5858 CommitSHA string `json:"commitSHA"`
59596060+ // CloneCommands are the git commands to run in the clone init container.
6161+ // Generated by tangled.org/core/spindle/steps.BuildCloneCommands().
6262+ // +optional
6363+ CloneCommands []string `json:"cloneCommands,omitempty"`
6464+6565+ // SkipClone indicates whether to skip the clone init container entirely.
6666+ // +optional
6767+ SkipClone bool `json:"skipClone,omitempty"`
6868+6069 // Workflows is the list of workflows to execute in this pipeline.
6170 // +kubebuilder:validation:MinItems=1
6271 Workflows []WorkflowSpec `json:"workflows"`
+5
api/v1alpha1/zz_generated.deepcopy.go
···2929// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
3030func (in *PipelineRunSpec) DeepCopyInto(out *PipelineRunSpec) {
3131 *out = *in
3232+ if in.CloneCommands != nil {
3333+ in, out := &in.CloneCommands, &out.CloneCommands
3434+ *out = make([]string, len(*in))
3535+ copy(*out, *in)
3636+ }
3237 if in.Workflows != nil {
3338 in, out := &in.Workflows, &out.Workflows
3439 *out = make([]WorkflowSpec, len(*in))
···6969 PipelineRun contains pipeline-specific information for this pipeline execution.
7070 This SpindleSet is ephemeral and represents a single pipeline run.
7171 properties:
7272+ cloneCommands:
7373+ description: |-
7474+ CloneCommands are the git commands to run in the clone init container.
7575+ Generated by tangled.org/core/spindle/steps.BuildCloneCommands().
7676+ items:
7777+ type: string
7878+ type: array
7279 commitSHA:
7380 description: CommitSHA is the Git commit to checkout.
7481 type: string
···8390 repoURL:
8491 description: RepoURL is the Git repository URL to clone.
8592 type: string
9393+ skipClone:
9494+ description: SkipClone indicates whether to skip the clone init
9595+ container entirely.
9696+ type: boolean
8697 workflows:
8798 description: Workflows is the list of workflows to execute in
8899 this pipeline.
+1-1
go.mod
···197197)
198198199199// Use our custom version of tangled until its upstreamed
200200-replace tangled.org/core => tangled.org/evan.jarrett.net/core v1.9.1-alpha.0.20251109203816-d73c73e8728b
200200+replace tangled.org/core => tangled.org/evan.jarrett.net/core v1.9.1-alpha.0.20251111023958-287f7ea82ba6