Lack of data validation In github.com/argoproj/argo-workflows/v3
Description
Argo Workflows: Unchecked annotation parsing in pod informer crashes Argo Workflows Controller
Summary
An unchecked array index in the pod informer's podGCFromPod() function causes a controller-wide panic when a workflow pod carries a malformed workflows.argoproj.io/pod-gc-strategy annotation. Because the panic occurs inside an informer goroutine (outside the controller's recover() scope), it crashes the entire controller process. The poisoned pod persists across restarts, causing a crash loop that halts all workflow processing until the pod is manually deleted.
Details
podGCFromPod() splits the annotation value on "/" and unconditionally accesses parts[1]:
func podGCFromPod(pod *apiv1.Pod) wfv1.PodGC { if val, ok := pod.Annotations[common.AnnotationKeyPodGCStrategy]; ok { parts := strings.Split(val, "/") return wfv1.PodGC{Strategy: wfv1.PodGCStrategy(parts[0]), DeleteDelayDuration: parts[1]} } return wfv1.PodGC{Strategy: wfv1.PodGCOnPodNone} }
If the annotation value contains no "/", parts has length 1 and parts[1] panics with index out of range.
The code was introduced in #14129 and affects versions:
3.6.x: v3.6.5 through v3.6.19 (backport in #14263)
3.7.x: v3.7.0-rc1 through v3.7.12
4.x: v4.0.0-rc1 through v4.0.3
Not affected: v3.6.4 and earlier
PoC
Apply this workflow to a cluster running the Argo Workflows controller:
kubectl apply -n argo -f - <<'EOF' apiVersion: argoproj.io/v1alpha1 kind: Workflow metadata: name: crash-podgc spec: entrypoint: main serviceAccountName: default...
Within seconds the controller crashes. The controller pod will show CrashLoopBackOff with increasing restart count. Controller logs show:
panic: runtime error: index out of range [1] with length 1 goroutine 291 [running]: github.com/argoproj/argo-workflows/v4/workflow/controller/pod.podGCFromPod(...) /home/runner/work/argo-workflows/argo-workflows/workflow/controller/pod/controller.go:176 github.com/argoproj/argo-workflows/v4/workflow/controller/pod.(*Controller).commonPodEvent(...) /home/runner/work/argo-workflows/argo-workflows/workflow/controller/pod/controller.go:197 github.com/argoproj/argo-workflows/v4/workflow/controller/pod.(*Controller).addPodEvent(...)...
Recovery requires deleting the poisoned workflow:
kubectl delete workflow -n argo crash-podgc
Impact
Any user who can submit workflows can crash the Argo Workflows controller and keep it down indefinitely. This is a denial-of-service against all workflows in the cluster. No workflows can make progress while the controller is crash-looping. The attacker needs only create permission on Workflow resources, which is the baseline permission for any Argo Workflows user.
Mitigation
Update Impact
Minimal update. May introduce new vulnerabilities or breaking changes.
Ecosystem | Package | Affected version | Patched versions |
|---|---|---|---|
go | 3.7.14 | ||
go | 4.0.5 |
Aliases
References