7. Test Plan¶

← Implementation Phases | Back to index | Next: Glossary →

Testing is structured in three layers. Each layer has a distinct scope, speed contract, and failure signal. All three layers run in CI; only unit and integration tests gate PRs. End-to-end tests run nightly against a staging cluster. Multi-tenant scenarios are explicitly covered at the integration and end-to-end layers, since tenant isolation is a correctness property of the system, not just a deployment concern.

7.1. Unit Tests¶

Scope: Pure logic within a single package — no network, no Kubernetes API, no real file I/O.

Speed contract: Full suite runs in under 30 seconds. Any test requiring a sleep or external call does not belong here.

Tooling: Standard go test ./... with testify/assert. Use go test -race in CI to catch goroutine data races.

What to cover:

Broker API client — Request construction, header injection, and response parsing for sessions, message, acquirejob, and renewjob. Use httptest.NewServer to serve static JSON fixtures without hitting GitHub. Assert that acquirejob and renewjob use the run_service_url from the message body, not the broker URL.
RenewJob loop — Verify the per-job goroutine calls renewjob at the correct 60-second interval, stops cleanly when the job completes, and handles a non-200 response from renewjob by surfacing an error without panicking.
Rate-limit (429) backoff — Drive the broker API client against an httptest server that returns 429 Too Many Requests with a Retry-After: 30 header. Assert the client honors Retry-After, increments actions_gateway_message_poll_errors_total{reason="rate_limited"}, and falls back to exponential backoff capped at 5 minutes when the header is absent. Assert that sustained 429s for >10 minutes surface a RateLimited condition on the corresponding RunnerGroup.
Token Manager — Use a fake clock to advance time to T-5 minutes before token expiry and assert the Token Manager fetches a new token before the old one expires. Assert that session goroutines reading the token during a refresh window get a valid (old or new) token and are never blocked. Assert that actions_gateway_token_refresh_errors_total increments on each failed refresh attempt; the alerting threshold for this metric is defined in docs/operations/observability.md (> 0 for 5 minutes triggers a page).
Payload decryption — AES-256 decryption of the TaskAgentMessage.Body field. Test against a pre-generated key/ciphertext pair committed as a fixture. Test failure modes: wrong key, truncated payload, invalid base64.
Session registry — Goroutine lifecycle management: spawn N sessions, verify N goroutines are running, scale down to M, verify M remain with no leaks. Use goleak.VerifyNone(t) (from go.uber.org/goleak) as a test cleanup hook — it identifies leaked goroutines by stack trace, making failures actionable. Do not use runtime.NumGoroutine() deltas, which include Go runtime goroutines and produce unreliable counts.
Label-to-pod mapping — The logic that translates RunnerGroup runner labels to a target pod spec. Table-driven tests covering label matches, mismatches, defaults, and invalid configurations.
AGC reconciler state machine — Unit-test the AGC reconciler's desired-vs-actual diffing logic with a fake client.Client (provided by controller-runtime/pkg/client/fake). Cover create, update, scale-up, scale-down, and delete transitions.
GMC reconciler state machine — Unit-test the GMC reconciler with a fake client. For a given ActionsGateway spec, assert that the reconciler produces exactly the expected set of Kubernetes objects (ServiceAccount, Role, RoleBinding, NetworkPolicy, proxy Deployment, proxy Service, proxy PodDisruptionBudget, HPA, AGC Deployment) all within the CR's own namespace. The reconciler does not create a ResourceQuota — that is platform-owned (Q130); assert it leaves any pre-existing namespace quota untouched. Table-driven tests covering spec creation, proxy scaling bound changes, and deletion. For credential rotation specifically: assert that updating gitHubAppRef.Name causes the GMC to update the AGC Deployment's volume reference to the new Secret name (triggering a rollout) and does not mutate or delete the old Secret.
HPA spec generation — For a range of ProxyConfig inputs (explicit values, all defaults, boundary values), assert the generated HorizontalPodAutoscaler has the correct minReplicas, maxReplicas, and targetCPUUtilizationPercentage. Assert that minReplicas is always ≥ 1 and ≤ maxReplicas. Assert proxy pods always have resources.requests.cpu set (required for HPA metric computation).
Proxy env injection — Assert that the AGC Deployment spec produced by the reconciler contains HTTP_PROXY, HTTPS_PROXY, and NO_PROXY env vars. Assert NO_PROXY includes kubernetes.default.svc.cluster.local and the configured noProxyCIDRs. Assert the same three vars appear in the worker pod template.
Status Conditions — Assert that the GMC sets the Ready, ProxyAvailable, and AGCAvailable conditions on ActionsGatewayStatus correctly as components become healthy or degrade. Assert conditions use standard metav1.Condition types compatible with kubectl wait --for=condition=Ready.
Runner version rejection — Unit-test the session goroutine's handling of a 400 Bad Request from POST /sessions containing a version-too-old message. Assert the goroutine surfaces the error as a RunnerGroup condition rather than silently retrying in a tight loop.
GMC RBAC boundary assertions — Enumerate the generated ClusterRole rules and assert that no rule grants * verbs on secrets, pods, or nodes at the cluster level. This is a regression guard against accidental privilege escalation during development.
gitHubAppRef namespace defaulting — Unit-test the defaulting logic: when Namespace is omitted, assert it resolves to the ActionsGateway CR's own namespace; when set explicitly, assert that value is used instead.
Reserved namespace blocklist validation — Unit-test the admission webhook logic that rejects ActionsGateway CRs created in reserved namespaces. The static defaults are kube-system, kube-public, and gmc-system. The webhook also reads POD_NAMESPACE (downward API) at setup time and adds it to the set, so installs into a non-default namespace are protected. Tests cover the static defaults plus a custom-install namespace driven by the constructor.

7.2. Integration Tests¶

Scope: Multiple components interacting with real infrastructure dependencies — a live Kubernetes API server and a stubbed GitHub backend. No actual GitHub network calls. No container image builds and no real Kubernetes scheduling — pods are created in the API server but never actually scheduled.

Speed contract: Full suite runs in under 5 minutes. Each test must complete in under 30 seconds. Tests run against a local envtest API server (from controller-runtime). kind is not used for this layer — it requires container builds and is slower than envtest.

Build tag: all integration test files carry //go:build integration. This keeps them out of the unit-test run (go test ./...) and requires an explicit -tags integration flag. CI runs them as a separate job after unit tests pass. Tests live under cmd/{agc,gmc}/internal/controller/integration/. See docs/development/testing.md for the run commands.

Why envtest, not the fake client: the fake client.Client cannot enforce CRD admission validation (CEL rules, x-kubernetes-validations), does not handle ownership references and garbage collection, and cannot test webhook behavior. envtest spins up a real kube-apiserver and etcd binary locally (no kubelet, no scheduler), so CRD schemas, admission webhooks, and status subresources all behave as in production.

Tooling: controller-runtime/pkg/envtest for the Kubernetes API surface. A shared stateful httptest fake broker under internal/brokertest/ for the GitHub broker — tests control it by enqueuing job messages on demand and asserting which sessions were deleted, rather than replaying static fixtures. Standard go test with testify and gomega.Eventually for eventually-consistent assertions. Ginkgo is not used; the integration tests follow the same testing-package style as the unit tests in this repo.

What to cover:

CRD install and validation — Install both ActionsGateway and RunnerGroup CRD schemas into envtest. Verify valid manifests are accepted. Verify invalid specs are rejected at admission: the namespace blocklist webhook rejects reserved names; CRD CEL rules reject priorityTiers in non-ascending threshold order; CRD CEL rules reject maxWorkers values that conflict with the last priorityTiers threshold; runnerLabels is rejected when empty (MinItems=1) or when an item contains whitespace/commas (per-item Pattern). The securityProfile no-downgrade guard is enforced by the GMC validating webhook (not CEL, since it reads metadata.annotations) and is unit-tested directly: an upgrade (baseline → restricted) and same-value update pass; a downgrade (restricted → baseline, or anything → privileged, including a dropped-field re-default to baseline) is rejected unless the allow-profile-downgrade: "true" annotation is present.
GMC tenant provisioning — Create a namespace, then apply an ActionsGateway CR into it. Verify the GMC creates all expected child resources within that same namespace: ServiceAccount (AGC + worker), Role, RoleBinding, NetworkPolicy, proxy Deployment (with resources.requests.cpu set), proxy Service, proxy PodDisruptionBudget, HPA, AGC Deployment (with HTTP_PROXY, HTTPS_PROXY, and NO_PROXY set), and bootstrap RunnerGroups. Verify the GMC does not create or modify the namespace itself, and does not create a ResourceQuota (platform-owned — Q130). TestGMC_TenantProvisioning_NoResourceQuotaCreated asserts a pre-existing platform quota is left untouched. Assert ActionsGatewayStatus.Conditions includes the ProxyAvailable and AGCAvailable condition types. Note: because envtest does not schedule pods, Deployment.status.readyReplicas stays at 0 — the Ready condition will not become True and tests assert the non-Ready state is reported correctly rather than asserting Ready=True. Additional provisioning cases: gitHubAppRef.namespace omitted defaults to the CR's own namespace; spec.proxy.noProxyCIDRs is merged with (not replaced by) the mandatory cluster-internal exclusions; updating gitHubAppRef.name causes the AGC Deployment to reference the new Secret without deleting the old one.
GMC tenant teardown — Delete an ActionsGateway CR and verify the GMC removes all associated resources, including the proxy Deployment, Service, PodDisruptionBudget, and HPA, without affecting any other tenant namespace. Assert that a second ActionsGateway CR remains fully intact. Also verify that re-applying the same CR after teardown brings all resources back cleanly.
HPA bounds update — Update spec.proxy.maxReplicas on a live ActionsGateway CR and verify the GMC patches the HPA to reflect the new bound within one reconcile cycle.
Proxy NetworkPolicy content — Verify the content of the generated NetworkPolicy in the API server: proxy pod egress includes the GitHub CIDR rules; AGC and worker pods have egress rules only to the proxy ClusterIP; DNS egress is always present. Verify that spec.proxy.managedNetworkPolicy: false suppresses the GitHub CIDR egress rules. Verify the IPRangeReconciler patches an existing policy when the fetched CIDR set changes. Note: envtest does not run a CNI plugin — these tests verify the NetworkPolicy spec content, not actual packet filtering.
AGC RBAC scope enforcement — Provision an ActionsGateway CR so the GMC creates the actions-gateway-controller ServiceAccount and its Role/RoleBinding. Impersonate that ServiceAccount via rest.ImpersonateConfig and attempt to list resources in a different tenant's namespace. Assert the API server returns 403. Assert that listing in the same namespace returns 200.
AGC reconciler end-to-end — Deploy a RunnerGroup, verify the AGC starts exactly one listener goroutine at rest (the permanent baseline). Enqueue jobs via the fake broker and verify additional goroutines spawn up to .spec.maxListeners. Verify idle goroutines shut down once the queue empties, leaving exactly one active listener. Update .spec.maxListeners, verify the new ceiling takes effect without restarting in-flight goroutines. Delete the resource, verify all goroutines exit and no agent Secrets or worker Pods are orphaned.
Secret lifecycle — Verify that a Secret is created with the correct payload labels when a job is intercepted, scoped to the correct tenant namespace, and deleted after the pod terminates. In envtest, pod phase must be advanced manually (no kubelet) — tests set the pod status to Succeeded via the status subresource client.
Event-driven pod completion — Verify the provisioner's InformerPodWaiter resolves a blocked session when the shared Pod informer observes the worker pod reach a terminal phase (and on pod deletion), rather than polling pod state on a timer. TestInformerPodWaiter_RealInformer runs a real manager cache against envtest, creates a pod, advances its status to Succeeded via the status subresource, and asserts WaitForCompletion returns promptly with the terminal phase.
Worker-Pod watch re-triggers reconciliation — Verify the RunnerGroup controller watches the worker Pods its provisioner creates, so a Pod lifecycle event (create on job acquire, terminal-phase transition, eviction, or delete) re-triggers a reconcile and refreshes the RunnerGroup's status without waiting for the next spec change or cache resync. TestAGC_Reconciler_WorkerPodEventTriggersReconcile runs a real manager against envtest, lets the controller quiesce (its reconcile count stops increasing), injects a sentinel condition into the listener-condition channel, then creates a labelled worker Pod and asserts the resulting reconcile both increments the reconcile count and flushes the buffered condition into status. The watch is deliberately Pods-only — a Secret watch would establish a Secret informer and cache credential material, violating the W3/H-2 cache-isolation property.
Pod provisioning — Verify that the AGC creates a Pod with the correct image, resource limits, volume mounts, and security context when a job payload is received from the fake broker. Verify controller-enforced invariants are applied unconditionally: automountServiceAccountToken: false, serviceAccountName: actions-gateway-worker, HTTP_PROXY/HTTPS_PROXY/NO_PROXY env vars injected with the provisioner's values. Verify priorityTiers tier assignment by pod count. Verify maxWorkers ceiling holds the third pod until an active pod completes. Verify the secure-by-default hardening: with SECURITY_PROFILE unset/baseline the pod gets pod-level runAsNonRoot + seccomp RuntimeDefault but no per-container privilege-escalation/capability floor; with restricted the per-container allowPrivilegeEscalation:false + drop-ALL floor is added; with privileged no SecurityContext defaults are stamped. Verify default 500m/1Gi requests+limits are stamped when the tenant container omits them, and that explicit tenant SecurityContext/resource values are preserved (gap-fill only). Verify the recommended app.kubernetes.io/* labels are present.
Failure recovery — Simulate a non-eviction pod failure (set pod status to Failed without reason: Evicted) and verify the AGC cleans up the associated Secret without leaking it and without triggering an automatic rerun. Simulate an eviction (set reason: Evicted) and verify the AGC calls the rerun API, increments actions_gateway_eviction_retries_total, and cleans up the Secret. Verify maxEvictionRetries: 0 suppresses the rerun call and increments actions_gateway_eviction_retries_exhausted_total instead. Simulate a namespace ResourceQuota rejection on pod create and verify the provisioner retries up to maxQuotaRetries times, increments actions_gateway_quota_retries_total on each attempt, and increments actions_gateway_quota_retries_exhausted_total when the budget is exhausted. Verify maxQuotaRetries: 0 returns an error immediately without touching any quota counter.
SIGTERM session cleanup — Start an AGC against the fake broker, burst the listener count to N sessions. Cancel the reconciler's context (simulating SIGTERM). Assert the AGC issues DELETE /sessions/{id} for every registered session before the context fully unwinds, confirmed by the fake broker. Assert no goroutine leak via goleak.VerifyNone.
Worker-pod reaper — Verify the RunnerGroup reconciler's reaper against a real apiserver: a worker pod advanced to a terminal phase via the status subresource is deleted once completedPodTTL elapses, driven by the real Pod watch and the reconciler's RequeueAfter timer (TestAGC_Reaper_CompletedPodDeletedAfterTTL); a pod that never schedules (envtest has no scheduler, so every pod is genuinely Pending) is deleted once pendingPodDeadline elapses, while a fresh Pending pod within the deadline survives (TestAGC_Reaper_StuckPendingPodDeletedAfterDeadline); and a pod created through the real provisioner path carries a controller OwnerReference to its RunnerGroup with the apiserver-assigned UID (TestAGC_Reaper_WorkerPodHasOwnerRef). The ownerRef cascade is not provable in envtest — there is no kube-controller-manager, so no GC controller runs — which is why the cascade hook is asserted here and the operational cleanup behaviour at Tier A.

7.3. End-to-End Tests¶

Scope: the full system deployed into a real Kubernetes cluster. GMC, AGC, and proxy binaries run as actual Pods. Proxy pods are scheduled and connected. Cert-manager issues TLS certificates for the admission webhook. NetworkPolicy is enforced by the CNI. HPA scaling is driven by metrics-server.

Tier structure. End-to-end tests split into three tiers along the "what's required to run them" axis:

Tier	What it tests	GitHub required?	When it runs
A — Infrastructure	GMC provisioning, proxy scheduling, NetworkPolicy enforcement, HPA, PDB, RBAC, teardown, GMC restart	No	every merge to `main`
B — Lifecycle (fake broker)	AGC session polling, job acquisition, pod creation, eviction retry, SIGTERM cleanup	No	every merge to `main`
C — Real GitHub	Actual workflow dispatch, log streaming, RenewJob across renewal cycles, proxy egress IP routing	Yes (GitHub App credentials)	nightly + on-demand

Tier A and Tier B run on a local kind cluster — fast enough for CI on every merge and for the inner dev loop. Tier C runs against a real GitHub App and a dedicated test repository; it requires E2E_GITHUB_APP_ID, E2E_GITHUB_APP_INSTALLATION_ID, E2E_GITHUB_APP_PRIVATE_KEY, E2E_GITHUB_ORG, and E2E_GITHUB_REPO to be set, and is skipped at runtime when any are missing.

What kind adds over envtest integration tests:

Capability	envtest (§7.2)	kind (Tier A/B)
CRD admission + CEL validation	✅	✅
Admission webhook with cert-manager TLS	⚠️ requires manual cert workaround	✅
Real pod scheduling (kubelet present)	❌	✅
Container images actually pulled and run	❌	✅
NetworkPolicy enforcement (CNI)	❌	✅
HPA scaling (`metrics-server` required)	❌	✅
PDB enforcement during node drain	❌	✅
Proxy CONNECT tunnel actually relays bytes	❌	✅
GMC/AGC/proxy binaries running as real Pods	❌	✅
Deployment rollout behavior	❌	✅

Speed contract: each Tier A/B test completes in under 3 minutes. Tier A+B together run in under ~30 minutes. Tier C tests take 2–15 minutes each (the 15-minute RenewJob test is the bound). Cluster setup is a one-time BeforeSuite cost, not counted against individual test budgets.

Build tag: all e2e files carry //go:build e2e. Tests are excluded from both go test ./... and the integration test run. Two Tier-A tests (E2E_GMC_HPA_ScalesUpUnderLoad and E2E_GMC_PDB_PreventsEvictionBelowMinAvailable) carry the Ginkgo Label("local-only") and are excluded from CI via --label-filter '!local-only' because they depend on CPU-load timing that is flaky on 2-vCPU GitHub Actions runners. They pass reliably on a local machine. See docs/development/testing.md for run commands.

Cluster shape. A multi-node kind cluster (1 control-plane + 2 workers) is required for pod anti-affinity and PDB tests to be meaningful. kindnet is the default CNI; it accepts NetworkPolicy objects but its bundled kube-network-policies enforcer does not drop egress traffic for the negative cases, so the two runtime egress-negative specs (E2E_GMC_TenantProvisioning_WorkloadEgressBlockedToNonProxyPod, E2E_GMC_TenantProvisioning_WorkerCannotReachK8sAPI) self-skip on kindnet. The make e2e-cluster KIND_CNI=calico profile (see kind-iteration.md) installs Calico instead; on that cluster the negatives assert real packet drops, each paired with an unlabelled control pod that proves the destination is reachable so a drop is attributable to NetworkPolicy enforcement.

Fake GitHub for Tier B. Tier B replaces real GitHub with test/fakegithub/, a standalone HTTP server deployed into the cluster as a Deployment + Service. It implements the broker protocol with stateful behaviour (session registration, controllable job-message delivery, recorded acquire/renew/rerun calls), the runner registration API (generate-jitconfig issuing real JIT blobs, list-by-name, delete, 409 on name collision), and exposes a control API used by tests to inject jobs and assert on calls. It can also simulate GitHub's single-use JIT runner behaviour (Q114): with SINGLE_USE_RUNNERS=true or POST /control/singleuse?enabled=true[&owner=<prefix>], a job acquisition deletes the delivering session's runner record — the dead session then serves one empty 200 (the decode response: EOF signature) and 401s thereafter, and re-registering a surviving name returns 409 — reproducing the M4 §12 death spiral without real GitHub. It is off by default and scoped by session-owner prefix so one spec can opt in without affecting parallel suites. fakegithub also models GitHub's pool-wide opportunistic delivery (M1 Investigation C/D): a job whose session is recycled away before it is acquired is carried to the owner's pending pool and delivered to the next live session, so the post-job re-registration does not strand a job that races a session's recycle window. The AGC is pointed at the fake by setting AGC_EXTRA_* env vars on the GMC, which forwards them into the AGC Deployments it creates (gated by --allow-agc-extra-env=true, set only in e2e). See docs/development/kind-iteration.md for the env-var details.

Tooling: Ginkgo-based suites under cmd/gmc/test/e2e/. A shared cmd/gmc/test/utils/ helper package wraps kubectl, kind, and the fakegithub port-forward. Tier C uses the GitHub REST API to dispatch workflows and poll runs to completion.

What to cover:

Smoke test — single job, single tenant — Create a namespace, apply one ActionsGateway CR into it, dispatch a minimal workflow (echo "hello"), and assert the run completes green with correct log output in the GitHub UI. This is the merge gate.
Parallel job execution — Dispatch a matrix workflow with 10 parallel jobs against a single tenant and assert all 10 complete successfully, verifying the session multiplexer handles concurrent polling without message collisions.
Multi-tenant isolation — Provision two ActionsGateway CRs pointing to different GitHub repositories and different namespaces. Dispatch simultaneous jobs to both. Assert that each job runs in its own namespace, that no Secrets are visible across namespaces, and that one tenant's resource consumption does not affect the other's job throughput.
Proxy egress isolation — Confirm via network observation that GitHub API calls and log stream traffic from both the AGC and worker pods exit through the tenant's proxy Service address, not directly through the cluster NAT. Assert no direct egress to GitHub IPs is observed from AGC or worker pods.
Proxy HA under disruption — Cordon one node hosting a proxy pod and drain it. Assert the PodDisruptionBudget prevents eviction until another proxy pod is scheduled, and that in-flight jobs are not interrupted during the disruption.
Tenant provisioning and deprovisioning — Create a namespace, apply an ActionsGateway CR, run a job successfully, then delete the CR. Assert all GMC-owned resources (proxy Deployment, proxy Service, HPA, PodDisruptionBudget, AGC Deployment, RBAC, NetworkPolicy) are removed but the namespace itself remains intact. The platform-owned ResourceQuota is not GMC-owned and must survive CR deletion (Q130). Re-apply the CR and verify a fresh gateway and proxy pool come up cleanly and can run jobs again.
Job failure propagation — Dispatch a workflow with a deliberately failing step (exit 1) and assert the GitHub UI correctly reflects the failure status. Verify the worker pod exits non-zero and is still cleaned up within the tenant namespace.
Worker pod lifecycle cleanup (Tier B, E2E_AGC_WorkerPodLifecycle) — Two tenants against fakegithub, run Serial (fakegithub session IDs carry no tenant identity, so the suite must not overlap other session-consuming suites). Tenant one uses a fast-exiting worker image and a short completedPodTTL: assert its completed worker pod and job Secret are deleted once the TTL elapses. Tenant two uses an unpullable worker image (reserved .invalid TLD) and a short pendingPodDeadline: assert the genuinely stuck-Pending pod is reaped after the deadline with a WorkerPodStuckPending Warning Event on the RunnerGroup. Also assert worker pods carry a controller OwnerReference to their RunnerGroup — this is the tier where the live GC controller makes that cascade real.
Single-use JIT agent self-heal (Tier B, E2E_AGC_SingleUseSelfHeal) — With fakegithub's single-use simulation enabled (scoped to this spec's RunnerGroup by owner prefix), run maxListeners + 1 sequential jobs. Each acquisition consumes its agent's runner record and kills its session; assert each consumed session is torn down and replaced by a fresh one, and that the final job — the one a pre-Q114 AGC could never serve, having burned every listener slot — still produces a worker pod.
Cross-tenant Secret opacity — After two tenants have completed jobs, assert via namespace inspection that neither tenant's namespace contains Secrets belonging to the other. Assert that the AGC ServiceAccount for tenant A cannot read Secrets in tenant B's namespace.
Resource cleanup under load — Dispatch 50 sequential jobs across 5 tenants and assert zero pod or Secret leaks remain after all runs complete. Checked by polling all tenant namespaces for residual resources 60 seconds post-completion.
Proxy HPA scaling — Dispatch a sustained burst of 50 concurrent jobs against a single tenant with spec.proxy.maxReplicas: 5 and spec.proxy.minReplicas: 2. Assert the HPA scales the proxy pool above minReplicas during the burst, and that replica count returns to minReplicas within 5 minutes of load subsiding. Assert no jobs are dropped during scale-up or scale-down.
GMC restart resilience — Delete and re-create the GMC pod while tenants are active. Assert that in-flight jobs are not interrupted, the GMC correctly re-derives tenant state on restart, no duplicate resources are created during reconciliation, and the proxy HPAs remain intact.
AGC restart resilience — Mid-run on a single tenant, delete and re-create the AGC pod. Assert in-flight jobs are not double-acquired, the AGC converges back to the correct goroutine count within one reconcile cycle, and all traffic continues routing through the proxy pool.
RenewJob under long-running job — Dispatch a workflow that sleeps for 15 minutes. Assert the job completes successfully, confirming the RenewJob loop correctly kept the lock alive across multiple renewal cycles without GitHub cancelling the job.
Rolling AGC upgrade — Start a steady stream of dispatched workflows against a single tenant, then patch the AGC Deployment image to a new tag mid-flight. Assert the upgrade completes the rolling update successfully, in-flight long polls drop and reconnect (with no duplicated job acquires observed via the broker's audit log), per-job RenewJob loops resume after the new pod starts, and total workflow success rate over the upgrade window stays above 95%. Assert that jobs whose lock expired during the blackout are redelivered and complete on retry.
GitHub IP range reconciliation — Simulate a GitHub meta API response with a new IP range not present in the existing NetworkPolicy. Trigger a GMC reconcile cycle and assert the proxy pod NetworkPolicy egress rules are updated to include the new range. Assert that spec.proxy.managedNetworkPolicy: false suppresses the update.

← Implementation Phases | Back to index | Next: Glossary →