Skip to content

Appendix A — Capacity Targets & SLOs

Glossary | Back to index | Next: Appendix B — Worker Isolation →


The following targets are conservative defaults derived from the architectural constraints in §2 and §3.5. They are intended as starting points to be refined against real production data; operators are expected to override them based on their cluster size, GitHub plan, and workload profile.

Latency SLOs (per-job, per-tenant)

Metric Target Source Note
Pod-creation latency (p95) ≤ 15s actions_gateway_pod_creation_latency_seconds From acquirejob success to pod Scheduled event. Dominated by image pull on cold nodes; sub-second on warm.
Pod-creation latency (p99) ≤ 60s actions_gateway_pod_creation_latency_seconds Tolerates cold-start image pull.
Session reacquisition after Actions Gateway Controller (AGC) restart ≤ 2 min derived Equal to GitHub's redelivery window; jobs redelivered within this window suffer no observable disruption.
Token refresh failure budget < 1 / hour actions_gateway_token_refresh_errors_total Anything above this rate indicates either GitHub API instability or a credential problem.

Capacity Targets (per-AGC pod, single tenant)

Resource Target Rationale
Concurrent virtual sessions (peak burst) ≤ 1,000 Memory-bound burst ceiling: each goroutine stack + HTTP buffer + token-manager indirection averages ~60 KiB resident; 1,000 sessions ≈ 60 MiB at peak. Steady-state cost is 1 session per RunnerGroup (~60 KiB each), far below this ceiling for typical deployments.
Memory request 2 GiB Sized for the peak burst ceiling of 1,000 concurrent goroutines (~60 MiB) with 4× safety margin for Go runtime overhead, heap churn, and reconcile storms. Actual steady-state resident size will be much smaller.
Memory limit 4 GiB Allows transient bursts during reconcile storms without triggering OOM.
CPU request 500m Predominantly I/O-bound; request reflects baseline scheduling weight rather than steady CPU draw.
CPU limit 2 (cores) Permits short bursts during reconcile churn or token refresh contention without throttling.

Capacity Targets (per GitHub App installation)

Resource Target Source
Concurrent sessions per installation ≤ 250 Bounded by §3.5 rate-limit math: ~72 message polls/hr/session against the 15,000/hr installation budget.
Sustained RateLimited condition < 1 min Anything longer indicates the operator is over budget and should shard across installations.

Capacity Targets (per proxy pod)

Resource Target Note
Concurrent CONNECT tunnels ≤ 500 File-descriptor-bound; tune the proxy pod ulimit nofile if increasing.
CPU request / limit 10m / 100m Defaults per ProxyConfig. Adjust upward if HPA lag is observed under bursty load.
Memory request / limit 32 MiB / 64 MiB Stateless CONNECT proxies have a small footprint; these defaults survive 500 concurrent tunnels with headroom.

Tenant-Aggregate Capacity (single ActionsGateway)

Resource Target Note
Active jobs (worker pods) ≤ 250 Conservative default governed by the platform-owned namespace ResourceQuota, maxWorkers, or the last priorityTiers threshold — whichever is most restrictive. Not rate-limit-bounded under the adaptive listener model; increase this ceiling by adjusting the namespace ResourceQuota and per-RunnerGroup concurrency controls.
Aggregate namespace ResourceQuota 20 CPU / 40Gi memory / 50 pods Conservative starting allocation. Platform-owned (set on the namespace, not the CR). Adjust against observed job CPU/memory profiles.

These numbers must be re-derived once two consecutive weeks of production telemetry are available. Treat them as a load-test design input, not as a contract.

Validation status (as of 1.0). The headline capacity figure — up to ~1,000 concurrent virtual sessions per AGC pod — is a design target derived from the memory-budget arithmetic above, not a measured result. No load test has yet exercised an AGC at that concurrency: the load-test harness and the 1,000-session run are deferred post-1.0 (Q13). The per-goroutine resident-cost estimate (~60 KiB) is likewise an architectural projection. Operators should size against their own observed telemetry rather than treat these ceilings as proven.


Glossary | Back to index | Next: Appendix B — Worker Isolation →