High-Scale Virtualized GitHub Actions Gateway — Design Documentation¶
This folder contains the full system design for the GitHub Actions Gateway, organized into focused documents with cross-references. All documents are intended to render correctly on GitHub.
Table of Contents¶
- Executive Summary & Problem Statement
- For Executive Leadership: GPU Utilization & Cost Justification
- For Tenant Teams: Self-Service & Cost Ownership
- For Platform Engineering: Operational Leverage & Shift Left
- Overview for Architects & Engineers
- Core Architectural Components
- 2.1 Tier 1 — Gateway Manager Controller (GMC)
- 2.2 Tier 2 — Actions Gateway Controller (AGC)
- 2.3 Tier 3 — Egress Proxy Pool
- 2.4 Tier 4 — Ephemeral Worker Pod
- 2.5 Observability
- 2.6 Upgrade Strategy
- API & Data Contract Specifications
- 3.1 Kubernetes CRD Schemas
- 3.2 GitHub App Credentials Secret Schema
- 3.3 Re-implemented Broker API Endpoints
- 3.4 Broker Payload Blueprints (Go Structs)
- 3.5 GitHub API Rate Limit Budget
- Operational Lifecycle Execution Flows
- 4.1 Tenant Provisioning Flow (GMC)
- 4.2 Job Execution Flow (AGC)
- Security & Threat Risk Assessment
- 5.1 GMC-Level Threats (Cluster-Scoped)
- 5.2 AGC & Proxy-Level Threats (Namespace-Scoped)
- 5.3 Security Profiles and the Privileged Opt-In
- Implementation Phasing & Delivery Milestones
- Milestone 1: Wire Protocol Probe (Days 1–4)
- Milestone 2: AGC Controller & Reconciler (Days 5–10)
- Milestone 3: Worker Pod & Pipe Handoff (Days 11–16)
- Milestone 4: Gateway Manager Controller + Proxy (Days 17–22)
- Milestone 5: Hardening & Load Testing (Days 23–26)
- Test Plan
- 7.1 Unit Tests
- 7.2 Integration Tests
- 7.3 End-to-End Tests
- Glossary
- Appendix A — Capacity Targets & SLOs
- Appendix B — Worker Isolation Runtime (Optional)
- Appendix C — AI-Assisted Implementation Notes (Optional)
- Appendix D — Alternatives Considered
- Appendix E — Capacity Planning & RunnerGroup Design
- Appendix F — Cost Model
- Appendix G — Optional Future Enhancements
- Network Architecture
Operations
- Getting Started — initial setup, credential rotation
- Observability — metrics reference, alert rules
- Troubleshooting — symptom → diagnosis → resolution
- Runbook — day-2 operations, incident response
- Upgrade & Rollback — per-component upgrade procedures
- Tenant Onboarding — checklist for onboarding a new tenant team
Reading Paths by Role¶
Architect — reviewing the overall design: start with 01-executive-summary.md, then 02-architecture.md, then 03-api-contracts.md. Read 04-operational-flows.md and 05-security.md for depth.
Platform engineer — deploying or operating the system: read Getting Started first, then 02-architecture.md §2.1 (GMC), Appendix A (SLOs), Observability, Runbook, and Upgrade & Rollback.
Security engineer — reviewing trust boundaries and threats: read 05-security.md, 02-architecture.md §2.4 (worker isolation), 03-api-contracts.md §3.2 (credentials), and Appendix B (runtime hardening).
Tenant team — authoring RunnerGroup configs: read Getting Started, 03-api-contracts.md §3.1 (CRD schemas), and Appendix E (sizing guidance).
System Overview¶
The gateway is a four-tier system for running GitHub Actions self-hosted runners at scale on Kubernetes:
| Tier | Component | Scope | Role |
|---|---|---|---|
| 1 | Gateway Manager Controller (GMC) | Cluster | Watches ActionsGateway CRs, provisions per-tenant resources |
| 2 | Actions Gateway Controller (AGC) | Namespace | Multiplexes GitHub broker sessions, acquires jobs, spawns worker pods |
| 3 | Egress Proxy Pool | Namespace | Stateless HTTPS CONNECT proxy pool; isolated egress IPs per tenant |
| 4 | Ephemeral Worker Pod | Namespace | Single-use pod that executes exactly one workflow job |
For a quick orientation, start with 01-executive-summary.md, then follow links from there.