Rate Limiter
Produce a complete rate limit handling design for an API integration. This covers detection, pre-emptive throttling, backoff strategy, request queuing, and multi-tenant isolation. The output is the technical specification a developer implements in the integration HTTP client layer.
Rate Limit Detection
HTTP 429 Response Handling
When the API returns HTTP 429 (Too Many Requests):
interface RateLimitResponse {
status: 429;
headers: {
'Retry-After'?: string; // Seconds to wait (preferred) or HTTP-date
'X-RateLimit-Limit'?: string; // Total limit per window
'X-RateLimit-Remaining'?: string; // Remaining requests in current window
'X-RateLimit-Reset'?: string; // Unix timestamp when limit resets
'X-RateLimit-RetryAfter'?: string; // Some APIs use this variant
};
}
function extractRetryAfter(response: Response): number {
const retryAfter = response.headers.get('Retry-After');
if (!retryAfter) {
// No header — use conservative default: 60 seconds
return 60;
}
// Check if it's a number (seconds) or an HTTP-date
const asNumber = parseInt(retryAfter, 10);
if (!isNaN(asNumber)) {
return Math.max(asNumber, 1); // At least 1 second
}
// It's an HTTP-date: parse and calculate seconds until that time
const resetDate = new Date(retryAfter);
const secondsUntilReset = Math.ceil((resetDate.getTime() - Date.now()) / 1000);
return Math.max(secondsUntilReset, 1);
}
Rate Limit Header Tracking
Parse rate limit headers on every response (not just 429), to detect when limits are approaching:
interface RateLimitState {
limit: number; // Total requests allowed per window
remaining: number; // Requests remaining in current window
resetAt: number; // Unix timestamp when window resets
}
function parseRateLimitHeaders(response: Response): RateLimitState | null {
const limit = response.headers.get('X-RateLimit-Limit');
const remaining = response.headers.get('X-RateLimit-Remaining');
const reset = response.headers.get('X-RateLimit-Reset');
if (!limit || !remaining || !reset) return null;
return {
limit: parseInt(limit, 10),
remaining: parseInt(remaining, 10),
resetAt: parseInt(reset, 10) // Unix timestamp
};
}
Known rate limit configurations by API type (document the specific vendor's limits):
| API | Limit | Window | Key | Notes |
|---|---|---|---|---|
| [Vendor AMS] | 100 req | 1 minute | Per API key | Sandbox: 10 req/min |
| [Carrier API] | 1000 req | 1 hour | Per account | Shared across all users |
| [Custom API] | 50 req | 1 minute | Per endpoint | Different limits per endpoint |
Pre-Emptive Throttling
Do not wait for a 429 to start throttling. Slow down before hitting the limit.
Throttle activation thresholds:
| Remaining % | Action |
|---|---|
| > 30% | Full speed — no throttling |
| 20-30% | Reduce rate by 25% |
| 10-20% | Reduce rate by 50% |
| < 10% | Reduce rate by 75%, log warning |
| 0% (exhausted) | Pause all requests until reset window; log alert |
Implementation:
class PreEmptiveThrottler {
private rateLimitState: RateLimitState | null = null;
updateState(state: RateLimitState): void {
this.rateLimitState = state;
}
async waitIfNeeded(): Promise<void> {
if (!this.rateLimitState) return; // No state yet — proceed
const remainingPct = this.rateLimitState.remaining / this.rateLimitState.limit;
if (remainingPct <= 0) {
// Exhausted — wait until reset
const msUntilReset = (this.rateLimitState.resetAt * 1000) - Date.now();
const waitMs = Math.max(msUntilReset + 500, 1000); // Add 500ms buffer after reset
logger.warn('Rate limit exhausted, pausing', { waitMs, resetAt: new Date(this.rateLimitState.resetAt * 1000) });
await sleep(waitMs);
} else if (remainingPct < 0.10) {
await sleep(750); // 750ms between requests
} else if (remainingPct < 0.20) {
await sleep(500); // 500ms between requests
} else if (remainingPct < 0.30) {
await sleep(250); // 250ms between requests
}
// > 30%: no delay
}
}
Backoff Strategy
Applied after a 429 response is received (reactive, not pre-emptive):
Primary strategy: Use Retry-After header value. This is always more accurate than any calculated backoff.
Fallback strategy (when no Retry-After header):
Exponential backoff with full jitter:
attempt_1: wait = random(5, 10) seconds
attempt_2: wait = random(10, 20) seconds
attempt_3: wait = random(20, 40) seconds
attempt_4: wait = random(40, 80) seconds [capped at max]
attempt_5: wait = random(80, 120) seconds [max — if still failing, dead-letter]
After attempt 5 with 429: Do not continue. Rate limiting this severe indicates
a systemic misconfiguration. Dead-letter the request and alert operations.
Do not retry immediately: Some integrations retry on 429 with no delay. This makes the rate limit problem worse (the retry itself counts against the limit) and can cause the API to block the client entirely.
Request Queuing
Control the outbound request rate using a token bucket algorithm:
class TokenBucketRateLimiter {
private tokens: number;
private lastRefill: number;
private readonly maxTokens: number;
private readonly refillRatePerMs: number;
constructor(requestsPerMinute: number) {
this.maxTokens = requestsPerMinute;
this.tokens = requestsPerMinute;
this.lastRefill = Date.now();
this.refillRatePerMs = requestsPerMinute / 60000; // tokens per millisecond
}
private refill(): void {
const now = Date.now();
const elapsedMs = now - this.lastRefill;
const newTokens = elapsedMs * this.refillRatePerMs;
this.tokens = Math.min(this.maxTokens, this.tokens + newTokens);
this.lastRefill = now;
}
async acquire(tokensNeeded: number = 1): Promise<void> {
while (true) {
this.refill();
if (this.tokens >= tokensNeeded) {
this.tokens -= tokensNeeded;
return;
}
// Not enough tokens — wait for refill
const msToWait = (tokensNeeded - this.tokens) / this.refillRatePerMs;
await sleep(Math.ceil(msToWait) + 10); // 10ms buffer
}
}
}
// Usage — configure at 80% of API limit (safety buffer):
// API limit: 100 req/min → configure limiter at 80 req/min
const limiter = new TokenBucketRateLimiter(80);
async function makeApiRequest(endpoint: string, options: RequestOptions) {
await limiter.acquire(); // Wait for token
await throttler.waitIfNeeded(); // Pre-emptive throttle
const response = await httpClient.request(endpoint, options);
const rateLimitState = parseRateLimitHeaders(response);
if (rateLimitState) throttler.updateState(rateLimitState);
return response;
}
Priority Queue
For integrations with multiple request types, use a priority queue to ensure time-sensitive requests are not delayed by bulk batch operations:
| Priority | Request Type | Examples |
|---|---|---|
| High | Real-time triggered | Webhook response, user-initiated lookup, payment processing |
| Normal | Scheduled sync | Hourly policy sync, daily client update |
| Low | Bulk batch | Historical data load, nightly full reconciliation |
Implementation: Use two or three separate token buckets. Allocate tokens preferentially:
- High priority: 60% of token budget
- Normal: 30%
- Low: 10%
When the token bucket is nearly empty, drain the low-priority queue before pausing normal-priority requests.
Per-Client Rate Limit Isolation
For multi-tenant integrations where each client has their own API credentials:
Problem: If one client's API key hits the rate limit, it should not affect other clients.
Solution: Maintain separate token bucket instances, one per API key:
class MultiTenantRateLimiter {
private limiters: Map<string, TokenBucketRateLimiter> = new Map();
getLimiter(clientId: string, requestsPerMinute: number): TokenBucketRateLimiter {
if (!this.limiters.has(clientId)) {
this.limiters.set(clientId, new TokenBucketRateLimiter(requestsPerMinute));
}
return this.limiters.get(clientId)!;
}
// Cleanup stale limiters (clients no longer active)
cleanup(activeClientIds: Set<string>): void {
for (const [clientId] of this.limiters) {
if (!activeClientIds.has(clientId)) {
this.limiters.delete(clientId);
}
}
}
}
// Usage:
const limiter = multiTenantLimiter.getLimiter(clientId, clientConfig.requestsPerMinute);
await limiter.acquire();
Rate limit configuration per client: Store each client's API key rate limit in their configuration record (database or Azure App Configuration). Use the vendor-documented limit × 80% as the configured limit.
Metrics and Observability
Log rate limit events for monitoring and capacity planning:
| Event | Log Level | Fields |
|---|---|---|
| Pre-emptive throttle triggered | Info | remaining_pct, delay_ms, client_id |
| 429 received | Warning | endpoint, retry_after_s, attempt, client_id |
| Rate limit exhausted, waiting for reset | Warning | reset_at, wait_ms, client_id |
| Rate limit recovered (after wait) | Info | client_id |
| Backoff exceeded, dead-lettering | Error | endpoint, attempts, client_id |
Dashboard metric: Track the rate limit headroom percentage over time. If the rolling average drops below 20%, the integration needs either a higher API tier or optimized batching.
Output Format
Deliver as:
- Rate limit detection specification (header parsing, 429 handling)
- Known rate limits table (vendor, limit, window, key type)
- Pre-emptive throttling thresholds table
- Backoff strategy (Retry-After-first, then exponential fallback — with parameters)
- Token bucket implementation (pseudocode with configuration parameters)
- Priority queue design (if applicable — describe tiers and allocation)
- Multi-tenant isolation design (if applicable)
- Metrics and logging specification
- Vendor-specific notes (quirks in the target API's rate limit implementation that the developer must know)