Why GAG over ARC?¶
GitHub Actions Gateway (GAG) targets one situation Actions Runner Controller
(ARC) scale-set mode does not handle well: running many runner groups for many
tenants in a shared Kubernetes cluster, cost-effectively, under one
ResourceQuota. The compounding problems all point back to cost and
self-service — most importantly, ARC's poor fit with ResourceQuota makes
per-tenant quotas unsafe, which is what blocks letting tenants run their own
runners. GAG was built to solve them together.
The problem¶
ResourceQuotais unsafe with ARC — so self-service is too. When a runner pod is preempted, OOM-killed, or simply can't schedule because the namespace quota is full, ARC retries pod creation on the same runner (a 30-second retry loop was added in recent versions), but it has no flow to release the GitHub job and reassign it to a runner that can run. After a fixed number of failures the runner is markedFailed, leaving the job stuck in GitHub's queue (up to its 24-hour timeout) until someone clears the runner and reruns the job manually. Quota exhaustion turns into stuck jobs rather than graceful queueing, which discourages enforcing the veryResourceQuotayou need to safely let tenants manage their own runner counts.- Scheduling starvation under a shared quota. Each ARC
AutoscalingRunnerSethas its ownmaxRunnerscap, but there is no primitive for "GPU runners must always claim at least N slots, regardless of how many CPU runners are active." Cheap CPU pods exhaust the quota first and the most expensive hardware loses the race — so a PR's big tests stall behind a flood of small ones. - Listener overhead at scale. ARC's scale-set listener is one pod per scale set running a full .NET runtime — roughly 256 MiB resident plus a cluster IP, held alive 24/7 to long-poll GitHub. Ten scale sets cost ~2.5 GiB and 10 pod slots at rest, before any job runs.
- Platform team as bottleneck. Onboarding a tenant means provisioning a namespace, quotas, controller scope, scale sets, NetworkPolicies, and egress — a platform-team checklist per team, with every later change landing as a ticket.
GAG vs ARC scale-set mode¶
| Capability | ARC scale-set mode | GitHub Actions Gateway |
|---|---|---|
Safe under a per-tenant ResourceQuota |
Quota-blocked jobs stall; manual cleanup + rerun | Auto fast lock-cancel + rerun, per-job budget |
| Guaranteed floor for critical runner types | No per-quota primitive | Priority tiers per runner group |
| Scale workers to zero between jobs | Yes (minRunners: 0) |
Yes — workers exist only while a job runs |
| Per-tenant dedicated egress IPs | Shared cluster egress | Per-tenant HTTPS CONNECT proxy pool |
| Listener overhead (10 runner groups, at rest) | ~2.5 GiB across 10 pods | ~600 KiB in 1 shared pod |
GAG also exposes Prometheus metrics scoped per tenant and runner group
(observability) and is, like every entry above,
driven by the single ActionsGateway resource shown below.
For limits and Service Level Objectives behind these claims, see Appendix A — Capacity Targets & SLOs; for the utilization-and-cost argument, Appendix F — Cost model.
One resource, a whole gateway¶
A tenant declares what they want in a single namespace-scoped resource. The Gateway Manager Controller (GMC) provisions the controller, proxy pool, RBAC, network policies, and quota to match — no cluster-admin involvement after the initial GMC install.
apiVersion: actions-gateway.github.com/v1alpha1
kind: ActionsGateway
metadata:
name: team-a-gateway
namespace: team-a
spec:
gitHubAppRef:
name: my-github-app # (1)!
gitHubURL: https://github.com/team-a-org
securityProfile: baseline # (2)!
proxy:
minReplicas: 2 # (3)!
maxReplicas: 10
# No namespaceQuota field: the ResourceQuota is platform-owned (4)!
runnerGroups:
- name: gpu-runners
runnerLabels: ["self-hosted", "gpu"]
maxListeners: 10
priorityTiers: # (5)!
- priorityClassName: runner-critical
threshold: 5
- priorityClassName: runner-standard
threshold: 20
podTemplate:
spec:
containers:
- name: runner
resources:
limits:
nvidia.com/gpu: "1"
- name: cpu-runners
runnerLabels: ["self-hosted", "linux"]
maxWorkers: 30
podTemplate:
spec:
containers:
- name: runner
- References a
Secretin the same namespace holding the GitHub AppappId,installationId, andprivateKey. The GMC watches the reference name, not the Secret contents — see credential rotation. - Selects the Pod Security Admission level the GMC stamps on the namespace.
Defaults to
baseline; userestrictedfor stricter isolation orprivilegedonly for workloads like docker-in-docker. See Security. - The per-tenant egress proxy pool is HPA-managed between these bounds; all GitHub traffic exits through it on dedicated IPs.
- The single
ResourceQuotaevery runner group shares is platform-owned — the platform admin sets it on the namespace, not on this CR, so it is a real cap the tenant cannot raise. Priority tiers decide who wins when it is contended. - The first 5 GPU pods get the higher-priority
PriorityClass; the next tier bursts opportunistically; the final threshold caps total concurrency. ThepriorityClassNamevalues must be on the platform's allowlist (the GMC--allowed-priority-classesflag), and whether a tier preempts is set on the platform-ownedPriorityClassobject — a tenant cannot name a class that evicts other tenants' pods.
Ready to try it? Follow the getting-started guide.