Appendix A — Capacity Targets & SLOs¶
← Glossary | Back to index | Next: Appendix B — Worker Isolation →
The following targets are conservative defaults derived from the architectural constraints in §2 and §3.5. They are intended as starting points to be refined against real production data; operators are expected to override them based on their cluster size, GitHub plan, and workload profile.
Latency SLOs (per-job, per-tenant)¶
| Metric | Target | Source | Note |
|---|---|---|---|
| Pod-creation latency (p95) | ≤ 15s | actions_gateway_pod_creation_latency_seconds |
From acquirejob success to pod Scheduled event. Dominated by image pull on cold nodes; sub-second on warm. |
| Pod-creation latency (p99) | ≤ 60s | actions_gateway_pod_creation_latency_seconds |
Tolerates cold-start image pull. |
| Session reacquisition after Actions Gateway Controller (AGC) restart | ≤ 2 min | derived | Equal to GitHub's redelivery window; jobs redelivered within this window suffer no observable disruption. |
| Token refresh failure budget | < 1 / hour | actions_gateway_token_refresh_errors_total |
Anything above this rate indicates either GitHub API instability or a credential problem. |
Capacity Targets (per-AGC pod, single tenant)¶
| Resource | Target | Rationale |
|---|---|---|
| Concurrent virtual sessions (peak burst) | ≤ 1,000 | Memory-bound burst ceiling: each goroutine stack + HTTP buffer + token-manager indirection averages ~60 KiB resident; 1,000 sessions ≈ 60 MiB at peak. Steady-state cost is 1 session per RunnerGroup (~60 KiB each), far below this ceiling for typical deployments. |
| Memory request | 2 GiB | Sized for the peak burst ceiling of 1,000 concurrent goroutines (~60 MiB) with 4× safety margin for Go runtime overhead, heap churn, and reconcile storms. Actual steady-state resident size will be much smaller. |
| Memory limit | 4 GiB | Allows transient bursts during reconcile storms without triggering OOM. |
| CPU request | 500m | Predominantly I/O-bound; request reflects baseline scheduling weight rather than steady CPU draw. |
| CPU limit | 2 (cores) | Permits short bursts during reconcile churn or token refresh contention without throttling. |
Capacity Targets (per GitHub App installation)¶
| Resource | Target | Source |
|---|---|---|
| Concurrent sessions per installation | ≤ 250 | Bounded by §3.5 rate-limit math: ~72 message polls/hr/session against the 15,000/hr installation budget. |
Sustained RateLimited condition |
< 1 min | Anything longer indicates the operator is over budget and should shard across installations. |
Capacity Targets (per proxy pod)¶
| Resource | Target | Note |
|---|---|---|
| Concurrent CONNECT tunnels | ≤ 500 | File-descriptor-bound; tune the proxy pod ulimit nofile if increasing. |
| CPU request / limit | 10m / 100m | Defaults per ProxyConfig. Adjust upward if HPA lag is observed under bursty load. |
| Memory request / limit | 32 MiB / 64 MiB | Stateless CONNECT proxies have a small footprint; these defaults survive 500 concurrent tunnels with headroom. |
Tenant-Aggregate Capacity (single ActionsGateway)¶
| Resource | Target | Note |
|---|---|---|
| Active jobs (worker pods) | ≤ 250 | Conservative default governed by the platform-owned namespace ResourceQuota, maxWorkers, or the last priorityTiers threshold — whichever is most restrictive. Not rate-limit-bounded under the adaptive listener model; increase this ceiling by adjusting the namespace ResourceQuota and per-RunnerGroup concurrency controls. |
| Aggregate namespace ResourceQuota | 20 CPU / 40Gi memory / 50 pods | Conservative starting allocation. Platform-owned (set on the namespace, not the CR). Adjust against observed job CPU/memory profiles. |
These numbers must be re-derived once two consecutive weeks of production telemetry are available. Treat them as a load-test design input, not as a contract.
Validation status (as of 1.0). The headline capacity figure — up to ~1,000 concurrent virtual sessions per AGC pod — is a design target derived from the memory-budget arithmetic above, not a measured result. No load test has yet exercised an AGC at that concurrency: the load-test harness and the 1,000-session run are deferred post-1.0 (Q13). The per-goroutine resident-cost estimate (~60 KiB) is likewise an architectural projection. Operators should size against their own observed telemetry rather than treat these ceilings as proven.
← Glossary | Back to index | Next: Appendix B — Worker Isolation →