Skip to main content
AI/MLjeremylongshore

coreweave-prod-checklist

'Production readiness checklist for CoreWeave GPU workloads.

Stars
2,267
Source
jeremylongshore/claude-code-plugins-plus-skills
Updated
2026-05-31
Slug
jeremylongshore--claude-code-plugins-plus-skills--coreweave-prod-checklist
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/HEAD/plugins/saas-packs/coreweave-pack/skills/coreweave-prod-checklist/SKILL.md -o .claude/skills/coreweave-prod-checklist.md

Drops the SKILL.md into .claude/skills/coreweave-prod-checklist.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

CoreWeave Production Checklist

Inference Services

  • GPU type and count validated for model size
  • Autoscaling configured (KServe or HPA)
  • Health and readiness probes set
  • Resource requests AND limits specified
  • Node affinity targeting correct GPU class
  • minReplicas >= 1 for production (no cold starts)

Storage

  • Model weights in PVC (not downloaded at startup)
  • Checkpoints saved to persistent storage
  • Storage class appropriate (SSD for inference, HDD for archival)

Security

  • Secrets for model tokens and registry access
  • Network policies applied
  • Container images from trusted registries

Monitoring

  • GPU utilization metrics collected
  • Inference latency and throughput tracked
  • Alert on pod restarts and OOM events
  • Log aggregation configured

Rollback

kubectl rollout undo deployment/my-inference
kubectl rollout status deployment/my-inference

Resources

Next Steps

For upgrades, see coreweave-upgrade-migration.