Edge Certificates at Scale: How to Manage Millions of TLS Certificates for Micro‑Apps
Tactical playbook for managing millions of TLS certs: ACME automation, HSM key custody, wildcard strategy, CT logging and DNS best practices.
Hook: Your micro‑app fleet needs TLS that doesn’t break at 2AM
Managing thousands — or millions — of tiny apps means millions of TLS decisions: ownership checks, renewals, private keys, CT logs, DNS fights and a never‑ending stream of failed validations. If your current setup still relies on manual issuance, clipboard copy/paste DNS changes, or a single wildcard that’s a single point of blast radius, you’ll be firefighting cert issues more than shipping product.
Inverted summary: What to do first (the 30,000‑ft view)
At scale, treat certificates as infrastructure: automate issuance with ACME, keep private keys in HSM or HSM‑backed KMS, use wildcards strategically (not everywhere), require DNS‑01 for custom domains, record everything to public Certificate Transparency and build observability for renewal, OCSP stapling and CT monitoring. Below you’ll find tactical patterns, code pointers, and an operational playbook you can implement this week.
Why this matters in 2026 — quick context
By 2026 the majority of web traffic uses TLS 1.3 and HTTP/3 (QUIC), and CDNs and edge platforms increasingly offer on‑edge certificate issuance. Browser enforcement of CT and stapled OCSP is stricter, and DNS‑based verification (DNS‑01) is the dominant method for automated wildcard issuance. Late‑2025 industry developments made edge CA/CDN integrations and HSM‑backed signing common in high‑scale platforms — so the bar for automation and security has risen.
Key problems you’ll solve
- Eliminate expired cert outages with reliable renewals and observability.
- Reduce issuance friction for custom domains using DNS‑API automations.
- Keep private keys safe under HSM controls and audit trails.
- Detect and respond to misissued certificates via CT monitoring.
Architecture patterns for “edge certificates at scale”
Pick one of these depending on your business model (multi‑tenant platform vs. white‑label provider):
1) Platform‑owned wildcard for subdomain scale
Use a platform wildcard like *.platform.example for all user apps under your domain. Pros: single wildcard, fewer issuances. Cons: cannot cover user custom domains and creates a single key blast radius.
2) Delegation + ALIAS/CNAME pattern (recommended for micro‑apps)
Require customer domains to CNAME to user-123.apps.platform.example. That lets you reuse your wildcard for the mapped hostname without issuing per‑domain certificates in many cases (depends on DNS flattening and apex handling). This reduces external CA churn and simplifies renewals.
3) Per‑domain certs for customer ownership (ACME per domain)
When customers bring their own domains, best practice is to issue a public cert per domain via ACME (DNS‑01 challenge). This gives clear ownership proof and aligns with browser requirements for public trust.
Tactical guide to wildcard certificates
Wildcards are tempting: one cert, many subdomains. But at scale you must weigh risk, scope and operational complexity.
- Use wildcards for internal or platform subdomains (e.g.,
*.apps.platform.example) where you control DNS and can rotate keys centrally. - Do not use a single wildcard across multiple tenants unless you accept the security blast radius and have strong access controls and HSM protection.
- For user‑provided domains, require DNS verification and issue per‑domain certs with short lifetimes (90 days is standard; shorter is fine with automation).
- Fallback strategy: if a domain can’t be verified, provide an app‑level TLS hostname under your domain so the app remains reachable while ownership is resolved.
ACME automation at scale — patterns and pitfalls
ACME is the backbone for automated issuance. Scaling it requires careful orchestration.
Which ACME challenges to use
- DNS‑01: Required for wildcard certs and safest for custom domains; relies on DNS provider API access.
- HTTP‑01: Simpler for non‑wildcards when you control the host, but fragile with clients behind CDNs and proxy layers.
- TLS‑ALPN‑01: Useful for edge cases, but more complex to automate at scale.
ACME client choices and orchestration
For scale, don’t run ad‑hoc Certbot processes. Build a certificate service that acts as the CA broker. Options:
- acme.sh — great for lightweight scripting and DNS providers.
- lego — native Go library, excellent if your infra is Go‑centric.
- step-ca (Smallstep) — run your own CA that speaks ACME and provides enterprise controls; useful when you need internal and external flows.
Rate limits and CA considerations
Public CAs impose rate limits. Let’s Encrypt (widely used) has per‑domain and per‑account limits — plan to batch, reuse wildcards when safe, or operate a commercial CA or CDN partner for higher quotas. Architect to cache DNS validations long enough to avoid repeated challenge churn.
Practical ACME orchestration flow
- Client requests a certificate from your certificate service with domain and policy.
- Service decides challenge type (prefers DNS‑01 for custom domains).
- If DNS‑01: service triggers DNS provider API to create TXT records and monitors propagation (use DNS query verification rather than time sleep).
- Complete ACME challenge, receive cert chain.
- Store cert metadata, push cert to edge nodes or CDN, staple OCSP, log to CT monitors, alert on success/failure.
HSMs and secure key management: how to avoid key leakage
Certificate security is mostly private‑key security. At scale, keep keys out of disk backups and developers’ laptops.
Where to store keys
- Root and intermediate CA keys: Always in HSM (on‑prem HSM with PKCS#11, or cloud HSM/KMS like AWS CloudHSM, Google Cloud HSM or Azure Key Vault HSM).
- Leaf keys: You can generate keys in HSM or generate ephemeral keys and wrap them with HSM‑protected wrapping keys. Full HSM generation is preferable for high‑security tenants.
- Edge private keys: For TLS termination at true edge (multiple POPs), use HSM‑backed signing or use key‑recovery and provisioning workflows with strong encryption in transit.
Performance considerations
HSM calls are slower than software signing. Avoid per‑request HSM signing at the edge. Instead:
- Use HSM for root/intermediate signing and key wrapping.
- Issue short‑lived leaf certificates signed by an intermediate whose private key is HSM‑protected but used via a low‑latency signing pool (e.g., HSM cluster with signing queues).
- Generate leaf keys on the edge where policy allows, then have HSM sign CSR asynchronously with caching for frequent renewals.
APIs and integrations
Use standard interfaces: PKCS#11 for on‑prem HSMs, vendor SDKs or KMS signing APIs for cloud HSM, and KMIP or dedicated signing microservices for enterprise setups. Maintain strict IAM and audit logs for key usage.
Certificate Transparency (CT) at scale — monitoring and response
CT is non‑negotiable for public certificates. Browsers require signed certificate timestamps (SCTs) from public logs for trust. At scale you must:
- Ensure every public certificate is submitted to multiple CT logs and has SCTs.
- Stream CT entries into your observability pipeline (use CertStream, crt.sh APIs, or Google’s CT log interfaces).
- Detect misissuance for your namespaces and for customer domains that affect you and trigger automated revocations or notifications.
Tip: Don’t assume your CA submits certificates to the same set of logs forever. Audit SCTs on issuance and use CT monitors to detect drift.
DNS and DNSSEC: the foundation for secure validation
DNS is central: you’ll use it for ACME DNS‑01 challenges, CNAME delegations, email deliverability and DANE possibilities. Harden DNS:
- Automate DNS provider API changes and verify propagation programmatically.
- Encourage or enforce DNSSEC signing for customer zones where possible; DNSSEC reduces risk of DNS spoofing during DNS‑01 challenges.
- For email: enforce SPF, DKIM and DMARC and publish them via DNS — certificate provisioning and email are tied to domain ownership.
Observability: what to instrument
Measure and alert on:
- Certificate expiry windows (every cert with 30/14/7/1 day alerts).
- Issuance failures and CA rate‑limit errors.
- OCSP stapling health per edge node (stale OCSP leads to browser warnings).
- CT log submission and SCT presence.
- DNS propagation times per provider and per region.
Operational playbook — step‑by‑step
- Inventory: export all active certificates, private key locations, CT status and OCSP stapling state.
- Design: choose wildcards only for owned namespaces; require DNS‑01 for custom domains; select HSM/KMS providers.
- Build a certificate issuance microservice that wraps ACME providers and HSM signing with policy enforcement. Expose a simple API that internal teams and edge nodes call.
- Integrate DNS provider libraries and build a DNS propagation verifier (don’t rely solely on sleep timers).
- Implement CT monitoring and automated alerting; subscribe to crt.sh and CertStream for broad coverage.
- Roll out with a canary cohort of apps and measure issuance latency, failure rates and rate limits before large rollout.
- Automate key rotation and emergency revocation workflows; run scheduled tabletop drills for certificate incidents.
Developer notes: code and tools
Two quick recipes to get your engineers productive:
ACME via lego (Go) — simplified flow
client, _ := lego.NewClient(config)
// use DNS provider: client.Challenge.SetDNS01Provider(myProvider)
certRes, err := client.Certificate.Obtain(request)
// store certRes.Certificate and use HSM signing as needed
Automated DNS verification pattern (pseudo)
// 1. Create TXT record via DNS API
createTxt(domain, token)
// 2. Poll authoritative NS for the TXT value
while !txtPresent(domain, token) { sleep(backoff) }
// 3. Tell ACME to validate
completeChallenge()
Libraries and services to evaluate: lego, acme.sh, step-ca, Certbot (for smaller teams), crt.sh APIs, CertStream. For HSM: AWS CloudHSM, Google Cloud HSM, Azure Key Vault HSM or Vault HSM integrations.
Case study: onboarding 1M micro‑apps with custom domains
Situation: platform X needed to onboard 1M micro‑apps (many with custom domains) within 12 months. They implemented:
- Delegation + CNAME pattern for the majority, enabling reuse of a set of platform wildcards.
- ACME service that preferred DNS‑01 via a set of integrated DNS providers and queued challenge requests to avoid CA rate limits.
- HSM for root/intermediate signing and a signing pool to keep latency low.
- CT monitoring pipeline and automated alerts that detected and revoked a small misissue within 2 hours.
Result: zero customer‑facing TLS outages from expiry, 80% reduction in manual DNS tickets and predictable cost per certificate via negotiated CA pricing.
Future predictions (2026–2028)
- Edge platforms will increasingly offer turnkey on‑edge CA issuance that abstracts ACME and HSM for tenants.
- Expect tighter browser rules around CT and OCSP; stapled OCSP will be mandatory in more browsers.
- Shorter cert lifetimes and automated rotation will be the default — manual long‑lived certs will be phased out.
- Greater adoption of HSM‑backed key generation at the edge and new KMS standards to reduce signing latency.
Checklist: launch a resilient certificate system this week
- Inventory all certs and private key locations.
- Decide wildcard policy and plan CNAME/delegation for customers.
- Choose ACME client and build a certificate issuance API.
- Integrate 1+ DNS provider API and implement DNS propagation verification.
- Put CA root/intermediate keys in HSM or HSM‑backed KMS.
- Pipe CT logs into alerting and set expiry alerts for 30/14/7/1 days.
- Run a canary group and measure certificate issuance time and failure rate.
Final takeaways — the essentials to implement now
- Automate ACME with DNS‑01 for custom domains and wildcards where required.
- Protect keys with HSM and design signing pools to avoid performance bottlenecks.
- Monitor CT logs and OCSP stapling to detect misissuance and stale revocation info.
- Use delegation patterns to minimize issuance counts and CA rate‑limit exposure.
Call to action
If you’re ready to stop babysitting certificates, start with a one‑week pilot: map your cert inventory, stand up a small ACME issuance service (try lego or step‑ca), and integrate a DNS provider API for DNS‑01. Want help? crazydomains.cloud offers a rapid audit and an edge certificate onboarding package that pairs ACME automation with HSM key custody and CT monitoring — reach out and we’ll help you move from firefighting to predictable, auditable TLS at scale.
Related Reading
- Tiny Speaker, Big Sound: Best Bluetooth Micro Speakers for Under $100
- Hiring via Puzzle Domains: A Flipping Case Study Inspired by Listen Labs’ Billboard Stunt
- BBC x YouTube Deal: How It Could Expand Free, Short-Form TV — And Where to Find It
- Virtual Tours + Teletherapy: Best Practices for Serving Clients Who Just Moved
- How New Skincare Launches Are Driving Demand for Specialized Facial Massages
Related Topics
Unknown
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
When the Platform Changes the Rules: Preparing for API and Policy Shifts from Major Providers
Negotiating Bulk Domain and Cloud Discounts: Lessons from Alibaba’s Growth
Monitoring the Monitors: How to Detect When Your Third‑Party Monitoring Tool Is Wrong
How to Run a Private Local AI Endpoint for Your Team Without Breaking Security
Choosing Storage: When to Use Local NVMe, Networked SSDs or Object Storage for App Hosting
From Our Network
Trending stories across our publication group