How We Deploy to GCP - Cloud Security Office Hours

Last updated 2026-05-08 · By Shawn Nunley · Vendor-specific (GCP) · View source on GitHub

Why this page exists. csoh.org is a community for cloud security practitioners, run by people who do this for a living. It would be embarrassing to host the site without taking the deployment seriously. This page is the receipt — a control-by-control walkthrough of the GCP setup you can adapt for your own static site, with all the Terraform and GitHub Actions YAML linked so you can read what we actually run.

Audience: security engineers, cloud architects, and anyone deploying a small thing to GCP and wanting to do it well. Familiarity with GCP basics, IAM, and CI/CD assumed; we link the glossary terms when we use jargon.

The threat model for a static site

Before any controls: what are we actually defending against? Static sites have a small attack surface compared to anything stateful. There is no database to inject into, no auth flow to bypass, no session to hijack, no per-user data to exfiltrate. That eliminates whole chapters of the OWASP Top 10 from worry.

What's left, ranked roughly by likelihood:

Supply-chain compromise of the build. Attacker tampers with the base image, an Action we use, or a dependency such that a malicious build artifact ends up in production. Mitigated by digest-pinning, locked Actions, immutable image tags, container scanning.
Credential theft from the deploy pipeline. A leaked CI token can push malicious bytes that look authentic. Mitigated by Workload Identity Federation (no long-lived credential exists to leak), short-lived everything, repo-scoped trust, branch-protection on the deployer workflow file itself.
DNS / TLS attacks at the edge. Misissued cert, cache poisoning, downgrade. Mitigated by HSTS preload, modern TLS policy (1.2+), managed certs that auto-renew, registrar lock + Cloudflare 2FA on DNS.
Defacement. Attacker who got into the build pipeline pushes embarrassing content. Mitigated by everything in (1) and (2), plus a fast-rollback story (Cloud Run revisions pin specific image SHAs; one gcloud command moves traffic back to the previous revision).
Volumetric DDoS. Mitigated by Cloud CDN absorption, Cloud Armor adaptive L7 protection, and per-IP rate limiting. Static content amplifies CDN effectiveness.
Bot scraping / abuse. Mitigated weakly — we don't really care about scraping the public site, but Cloud Armor's WAF rules handle classic SQLi/XSS/LFI probes that bots throw at every endpoint.

Things we explicitly do not defend against because they don't apply: authenticated session theft, broken access control, business-logic abuse, privileged-pivot from an application server. The corollary: anyone reading this for a non-static site needs more controls than what's on this page.

Architecture at a glance

Browser
   │
   ▼
Cloudflare (proxy mode for csoh.org/www, terminates browser TLS,
            absorbs traffic at edge before our origin sees it)
   │  origin TLS
   ▼
Google Cloud Global HTTPS Load Balancer  (anycast IP)
   ├── Modern TLS policy — TLS 1.2+ only, restricted ciphers
   ├── Managed cert (auto-renew, ACME HTTP-01)
   ├── HTTP → HTTPS redirect (301, separate forwarding rule)
   ├── Cloud Armor — OWASP CRS WAF + per-IP rate limit + adaptive L7 DDoS
   └── Cloud CDN — edge cache for static assets
   │
   ▼
Serverless NEG → Cloud Run (csoh-site)
   ├── ingress = INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER (no public Run URL)
   ├── runtime SA: csoh-run-runtime  (zero IAM roles)
   └── nginx:1.27-alpine (digest-pinned) + apk upgrade + custom CSP/HSTS/COOP/CORP

Build path (out of band):
GitHub push → Actions runner → OIDC token → WIF exchange → 1-hour GCP token
            → docker build → Trivy scan → Artifact Registry (immutable tags)
            → gcloud run deploy (revision pinned to image SHA)

Telemetry:
Cloud Armor decisions, LB 4xx/5xx, IAM admin activity, audit logs
            → Cloud Logging sink → 400-day retention bucket

Every box on that diagram is provisioned by Terraform (infra/terraform/). There is no click-ops; the GCP console is read-only for normal operation.

Identity & access (the WIF story)

The single most consequential control on this page is that there is no service account key anywhere in the system. GitHub Actions does not have a JSON key to push containers, deploy Cloud Run revisions, or do anything else against GCP. Instead:

The workflow declares permissions: id-token: write. GitHub will mint an OIDC token at run-time, signed by GitHub's identity provider.
The token's claims include repository, ref, workflow, job_workflow_ref, etc. — facts about which workflow run is asking.
A Workload Identity Pool in our GCP project trusts that GitHub OIDC issuer.
An attribute condition on the pool gates the exchange:
```
attribute_condition = "assertion.repository == 'CloudSecurityOfficeHours/csoh.org'"
```
Any other repository trying to authenticate to this pool is rejected outright. A leaked workflow file from a different repo cannot mint a token for our project.
The exchanged GCP access token is short-lived (~1 hour) and scoped via roles/iam.workloadIdentityUser to impersonating one service account: csoh-deployer.
csoh-deployer has the minimum roles to do its job: roles/run.admin on the project, roles/artifactregistry.writer, and roles/iam.serviceAccountUser on the runtime SA (so it can set the runtime identity on a deploy). Nothing else.
The runtime service account that the actual container runs as — csoh-run-runtime — has zero IAM roles. The container is static nginx that makes no GCP API calls. There is nothing to grant.

What a leaked workflow log gets an attacker: ~1 hour of access to push images and deploy revisions in one project. No persistence, no lateral movement, no read access to other projects. There is no key in GitHub Secrets to rotate, no expiry to track, no offboarding ritual. The Terraform that wires this up is in wif.tf — about 30 lines.

The runtime SA having zero roles is on purpose

It's tempting to grant the container some roles "just in case." Don't. The blast radius of an exploited container is bounded by what its identity can do; if its identity can't do anything, an exploited container can't do anything either. If the day comes when the site needs a backend service, that service gets its own SA with only the roles that service needs — not a single shared "the website" SA whose permissions accumulate over time.

Image supply chain

The container we run in production has to be this nginx with this config and nothing else. The supply chain has three honest places where someone could substitute "this" with "something the attacker prefers":

1. The base image

Unpinned tags are a registry-write vulnerability. FROM nginx:1.27-alpine means "whatever bytes the registry returns when we ask for that tag today" — and registries occasionally have bad days. Our Dockerfile pins to a digest:

FROM nginx:1.27-alpine@sha256:65645c7bb6a0661892a8b03b89d0743208a18dd2f3f17a54ef4b76fb8e2f2a10

That is content-addressed: the registry cannot serve different bytes without changing the digest, and Docker fails the build if they don't match. The trade-off is that we have to update the digest manually as new base images come out — a deliberate maintenance friction in exchange for tamper-evidence.

2. Stale packages on top of the pinned base

The catch with digest-pinning a base image: the OS packages baked into that digest age. Alpine ships security fixes for libssl, libxml2, libpng, etc. constantly; a digest from 6 months ago has 6 months of accumulated CVEs. We resolve this by running apk upgrade --no-cache immediately after FROM:

RUN apk upgrade --no-cache && \
    rm -rf /var/cache/apk/*

Net effect: we start from known bytes (the digest pin) and end with current packages (the apk upgrade). The Trivy scan in the next step verifies we haven't slipped on either dimension.

3. The CI build artifact

Every PR's container goes through Trivy:

trivy image \
  --exit-code 1 \
  --ignore-unfixed \
  --severity HIGH,CRITICAL \
  "${IMAGE}"

The build fails on any HIGH or CRITICAL CVE that has a fix available. --ignore-unfixed filters CVEs whose upstreams haven't shipped a patch — those are noise we can't act on, and including them would just train people to ignore the scan output. After the scan passes, the image is pushed to Artifact Registry with two important properties:

Immutable tags. Once an image is pushed at csoh-site:<sha>, the tag cannot be overwritten or moved. There is no way for an attacker (or a careless engineer) to silently swap what bytes that tag refers to.
SHA-based, not :latest. The Cloud Run revision pins a specific SHA tag. Rollback is gcloud run services update-traffic --to-revisions <name>=100, and there is zero ambiguity about what bytes the rolled-back revision is running.

The Artifact Registry repo also has a cleanup policy — keep the most recent 30 versions, delete untagged versions older than 7 days. That keeps the storage bill small without losing rollback targets.

Edge defenses (LB, WAF, CDN, TLS)

Cloud Run sits behind a Global External HTTPS Load Balancer with the ingress restricted to INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER. Direct HTTPS hits to the Run service URL get a 404 from the GCP frontend — the only valid path to the application is through the LB. That matters because all of the controls below live on the LB; if you could reach the app directly, you'd skip them.

Cloud Armor — the WAF and rate limiter

Defined in cloud_armor.tf. We're running OWASP Core Rule Set rules at sensitivity level 1 — the lowest false-positive setting — against SQLi, XSS, LFI, RFI, and RCE patterns. For a static site that doesn't have SQLi-vulnerable endpoints, this is more about demonstrating the control than blocking real attacks; it does, however, eat the noise of generic bot probes.

The rate limiter is per-source-IP, 600 requests/minute, 10-minute ban on exceed. Important caveat for our setup: because Cloudflare proxies production traffic, Cloud Armor sees Cloudflare's egress IPs rather than real client IPs. The per-IP rate limit therefore applies per-Cloudflare-IP — a meaningfully looser bound than per-end-user. Cloudflare itself does its own rate limiting at the edge, which catches most of what we'd otherwise want Cloud Armor to catch; the Cloud Armor rule mostly protects us from anything that gets past Cloudflare or hits us via the gray-cloud staging hostname. A future improvement is to wire X-Forwarded-For trust into Cloud Armor for Cloudflare's IP ranges so we can rate-limit on the real client IP. Adaptive L7 DDoS defense is enabled — Cloud Armor will auto-generate signatures for unusual traffic patterns and apply them as additional rules during an actual attack.

Cloud CDN

Cache mode CACHE_ALL_STATIC, default TTL 1 hour, max 24 hours, with negative_caching on so 404s and 410s cache too (this matters during a deploy: stale 404s during cache invalidation aren't expensive). serve_while_stale is 24 hours — if the origin (Cloud Run) is unreachable, the CDN keeps serving the last known good version for up to a day. That's a free availability buffer for a static site.

The performance side of the CDN is also doing the security work of absorbing floods. Volumetric traffic hits the edge, not Cloud Run.

TLS — two layers

Browser ↔ Cloudflare: Cloudflare terminates browser TLS using its Universal SSL cert, valid for csoh.org and www.csoh.org. This is what end users see; it's free, auto-renews, and is signed by a CA the browser already trusts.

Cloudflare ↔ GCP LB origin: Cloudflare connects to the GCP load balancer over TLS on its origin connection. The LB has a Google-managed cert and a Modern TLS policy attached (TLS 1.2 minimum, restricted cipher suite). Cloudflare's SSL/TLS mode for the origin connection determines how strict the validation of that origin cert is — Full (strict) is the right setting once the GCP cert covers the origin hostname; Full is a defensible middle-ground while you stabilise.

Why we don't put production hostnames in the GCP managed cert: Google managed certs use HTTP-01 ACME validation, which requires the domain to resolve directly to the LB. While Cloudflare proxies csoh.org, Google's ACME validator can't see the LB and the cert hangs in FAILED_NOT_VISIBLE. The GCP cert therefore only covers gcp.csoh.org, which is a DNS-only Cloudflare record pointing straight at the LB IP.

We send HSTS with includeSubDomains; preload from nginx, plus X-Content-Type-Options, X-Frame-Options: DENY, Referrer-Policy, Permissions-Policy, Cross-Origin-Opener-Policy, Cross-Origin-Resource-Policy, and a strict CSP (no inline scripts, no unsafe-eval).

The HTTP listener (port 80) on the LB is a bare 301 redirect to HTTPS, served by a separate URL map. Cloudflare also enforces "Always Use HTTPS" at the edge, so plain HTTP rarely reaches our origin.

CI/CD pipeline

The full deploy path is in gcp-deploy.yml. Walking it top to bottom:

Triggers on pushes to main matching specific paths (HTML, CSS, JS, Dockerfile, nginx.conf, the workflow file itself), plus workflow_dispatch for manual runs.
Concurrency group gcp-deploy with cancel-in-progress: true — if a newer commit lands while an older deploy is mid-flight, the older one is cancelled. Prevents the stale-content race where an older deploy publishes after a newer one finishes.
Permissions block scopes the auto-injected GITHUB_TOKEN to contents: read + id-token: write. Nothing else.
Environment gate — the job declares environment: production. The repo's production environment is configured to allow deployments only from main, and required reviewers can be added. A PR from a fork cannot run this job.
WIF auth via google-github-actions/auth, pinned to a specific commit SHA (not a tag). The action exchanges the OIDC token for a 1-hour GCP access token.
Docker build with descriptive org.opencontainers.image.* labels (source repo, revision SHA, build timestamp). These labels are visible on the image in Artifact Registry and make incident triage faster.
Trivy install + scan — install via Aqua's apt repo (signed package), scan the freshly-built image, fail on HIGH/CRITICAL.
Push to Artifact Registry, single immutable SHA tag.
Deploy a new Cloud Run revision via gcloud run deploy, which atomically swaps traffic. The revision is named (e.g. csoh-site-00007-abc); rollback to a previous revision is one CLI call.

What protects the workflow file itself

The workflow can deploy with a lot of authority — so the file that defines it is an extra-sensitive piece of code. Two layers protect it:

CODEOWNERS requires @Nunley review on any change to .github/workflows/, infra/, Dockerfile, nginx.conf, and security docs. The branch ruleset enforces "require Code Owners review" — a PR touching these surfaces cannot merge without that explicit approval.
Branch ruleset on main requires PR with 1 approval, blocks deletion and non-fast-forward, and gates merging on three required status checks (ruff, actionlint, yamllint). Combined with the production environment's main-only branch policy, a malicious PR cannot reach the deploy code path.

Repository-side hardening that's not GCP-specific but worth naming: secret scanning + push protection are on (any commit containing a known credential pattern is blocked at push time), Dependabot security updates are on, and every third-party Action in every workflow is pinned to a commit SHA, not a tag.

Logging & detection

The default Cloud Logging configuration retains logs for 30 days in the _Default bucket. For security-relevant events that's not enough — incidents are often discovered weeks or months after the fact, especially supply-chain ones. We define a custom 400-day retention bucket and a log sink that routes the events worth keeping into it (logging.tf):

(resource.type="http_load_balancer"
   AND jsonPayload.enforcedSecurityPolicy.outcome="DENY")
OR (resource.type="http_load_balancer" AND httpRequest.status>=400)
OR protoPayload.serviceName="iam.googleapis.com"
OR protoPayload.@type="type.googleapis.com/google.cloud.audit.AuditLog"

That filter captures: every Cloud Armor block decision (so we can tune rules and spot real attacks), every 4xx/5xx response from the LB (for both performance and abuse triage), every IAM policy change (the highest-leverage admin events in any GCP project), and the full audit-log stream.

We do not currently route to a SIEM — for a community site, the cost/benefit doesn't pencil out. For a production SaaS workload, you'd want this same sink plus an export to BigQuery (long-term analytics) and/or Pub/Sub (streaming detection).

What we didn't do (and why)

Listing controls we considered and rejected is more honest than pretending the design is finished. Each of these is a defensible choice for a small static site, and a less-defensible choice as the surface area grows.

Binary Authorization with attestation. We enabled the API but don't enforce a Binary Auth policy yet. Trivy + immutable AR tags + WIF-restricted push already cover most of what Binary Auth would add for a single-image, single-deployer setup. We'll add it when there's a multi-image / multi-environment story to enforce.
Gray-cloud DNS (DNS-only Cloudflare records). We tried this — gray cloud lets Cloud Armor see real client IPs, but it forces a TLS-error window every time the GCP managed cert is recreated, because Google's HTTP-01 ACME validation requires DNS-direct resolution. After running into FAILED_NOT_VISIBLE on cert SAN expansion mid-cutover, we settled on Cloudflare proxy mode (orange cloud) for production hostnames and gray cloud only for the staging hostname (gcp.csoh.org) where the GCP cert lives. Trade-offs documented in the TLS and Cloud Armor sections above.
Wiring X-Forwarded-For trust into Cloud Armor. This is the natural follow-up to the proxy decision: tell Cloud Armor to trust the X-Forwarded-For header from Cloudflare's IP ranges so it rate-limits on the real client IP. We have it on the to-do list; it requires keeping a current allowlist of Cloudflare egress ranges in Terraform, which is a non-zero maintenance ask.
Custom VPC-SC perimeter. Service Perimeters are heavyweight; they make sense around services holding sensitive data. The runtime SA has no access to anything sensitive, so there's nothing to perimeterize.
SLSA provenance attestation. Worth doing; on the to-do list. The slsa-github-generator emits a signed build provenance record we could attach to each image and verify before deploy.
Image signing with cosign. Same shape — would prove "this image was built by our pipeline" and let us reject unsigned images at deploy time. A natural follow-up to SLSA provenance.
Distroless or scratch base. nginx-on-alpine has more attack surface than a single static binary in a scratch container. We're staying on nginx-alpine because it gives us the URL-rewriting, header-injection, and config flexibility we use today; the Trivy scan + apk upgrade keeps the package surface honest.
Multi-region failover. Cloud Run is single-region. The CDN absorbs most of the load, and the site is content-static (no db to replicate), so a region outage means stale content from CDN edges, not downtime. Global multi-region adds cost and complexity that only pays off above our scale.
SIEM integration. See the logging section above. Logs are retained; alerting is not yet.

Cost

Approximate monthly run-rate at our traffic level:

Component	Approx / month
LB forwarding rules (×2)	~$18
Cloud Armor policy	~$5 + $0.75/rule
Cloud Run (scale-to-zero)	$0–2
Artifact Registry storage	< $1
Cloud Logging (low volume)	< $2
Egress + CDN	~$1–3
Total	~$30/mo

For comparison: shared LiteSpeed hosting (the previous setup) was a small flat fee and provided none of the controls on this page. We're paying ~10× the hosting cost to run the site as a security demonstration. That's a deliberate trade-off; for a private project with no showcase value, the cheaper static-host-on-someone-else's-server is a perfectly reasonable answer.

Questions?

Bring them to Friday Zoom. Several of our regulars run nontrivial GCP setups (multi-project orgs, Binary Authorization in production, signed build artifacts) and are happy to walk through specifics for your environment. The meeting recaps often include cloud-deployment war stories.