AI/MLjeremylongshore

palantir-prod-checklist

'Execute Palantir Foundry production deployment checklist and rollback

Stars: 2,267
Source: jeremylongshore/claude-code-plugins-plus-skills
Updated: 2026-05-31
Slug: jeremylongshore--claude-code-plugins-plus-skills--palantir-prod-checklist

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/HEAD/plugins/saas-packs/palantir-pack/skills/palantir-prod-checklist/SKILL.md -o .claude/skills/palantir-prod-checklist.md

Drops the SKILL.md into .claude/skills/palantir-prod-checklist.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

Palantir Production Checklist

Overview

Complete go-live checklist for deploying Foundry-integrated applications to production. Covers credential management, health checks, monitoring, and rollback procedures.

Prerequisites

Staging environment tested and verified
Production OAuth2 credentials from Developer Console
Deployment pipeline configured
Monitoring infrastructure ready

Instructions

Pre-Deployment: Credentials & Config

OAuth2 client credentials in secrets manager (not personal tokens)
Scopes are minimal: only what the app actually needs
FOUNDRY_HOSTNAME points to production enrollment
Separate credentials from staging (not shared)
Credential rotation schedule documented (90-day max)

Code Quality

All tests passing including Foundry integration tests
No hardcoded hostnames, tokens, or RIDs
Error handling covers all Foundry ApiError status codes
Rate limiting with exponential backoff implemented
Logging uses structured format (JSON) with request IDs

Infrastructure

Health check endpoint verifies Foundry connectivity

@app.get("/health")
async def health():
    try:
        client.ontologies.Ontology.list()
        return {"status": "healthy", "foundry": "connected"}
    except foundry.ApiError as e:
        return {"status": "degraded", "foundry": f"error_{e.status_code}"}

Circuit breaker pattern for Foundry API calls
Graceful degradation when Foundry is unreachable
Timeout configuration: 30s for reads, 60s for writes
Connection pooling configured

Monitoring & Alerting

Metrics: request count, latency p50/p99, error rate by status code
Alert: 5xx error rate > 5% for 5 minutes → P1
Alert: p99 latency > 10s for 10 minutes → P2
Alert: 429 rate > 10/min → P2 (tune rate limiter)
Alert: 401/403 errors → P1 (credential issue)
Dashboard with Foundry API health summary

Documentation

Incident runbook: palantir-incident-runbook
Credential rotation procedure documented
Rollback procedure documented and tested
On-call escalation path defined
Foundry support contact info available

Deploy

set -euo pipefail
# Pre-flight
curl -sf "https://$FOUNDRY_HOSTNAME/api/v2/ontologies" \
  -H "Authorization: Bearer $FOUNDRY_TOKEN" > /dev/null \
  && echo "Foundry API reachable" || echo "BLOCKED: Foundry unreachable"

# Deploy with canary
kubectl set image deployment/my-app app=myimage:v2.0.0 --record
kubectl rollout status deployment/my-app --timeout=300s

Rollback

kubectl rollout undo deployment/my-app
kubectl rollout status deployment/my-app

Output

Production deployment with verified Foundry connectivity
Health checks passing
Monitoring and alerting active
Rollback procedure tested

Error Handling

Alert	Condition	Severity
Foundry Unreachable	Health check fails 3x	P1
Auth Failure	Any 401/403	P1
Rate Limited	429 > 10/min	P2
High Latency	p99 > 10s	P2

Resources

Next Steps

For version upgrades, see palantir-upgrade-migration.