Skip to main content
AI/MLakaszubski

observability

Structured logging, debugging (pdb/ipdb), profiling (cProfile/line_profiler), and performance monitoring. Use when adding logging, debugging issues, or optimizing performance. TRIGGER when: logging, debug, profiling, performance monitoring, metrics, stack trace. DO NOT TRIGGER when: feature implementation, testing, documentation, config changes.

Stars
29
Source
akaszubski/autonomous-dev
Updated
2026-05-31
Slug
akaszubski--autonomous-dev--observability
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/akaszubski/autonomous-dev/HEAD/plugins/autonomous-dev/skills/observability/SKILL.md -o .claude/skills/observability.md

Drops the SKILL.md into .claude/skills/observability.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

Observability Skill

Comprehensive guide to logging, debugging, profiling, and performance monitoring in Python applications.

When This Skill Activates

  • Adding logging to code
  • Debugging production issues
  • Profiling performance bottlenecks
  • Monitoring application metrics
  • Analyzing stack traces
  • Performance optimization
  • Keywords: "logging", "debug", "profiling", "performance", "monitoring"

Core Concepts

1. Structured Logging

Structured logging with JSON format for machine-readable logs and rich context.

Why Structured Logging?

  • Machine-parseable (easy to search, filter, aggregate)
  • Context-rich (attach metadata to log entries)
  • Consistent format across services

Key Features:

  • JSON-formatted logs
  • Log levels (DEBUG, INFO, WARNING, ERROR, CRITICAL)
  • Context logging with extra metadata
  • Best practices for meaningful logs

Example:

import logging
import json

logger = logging.getLogger(__name__)
logger.info("User action", extra={
    "user_id": 123,
    "action": "login",
    "ip": "192.168.1.1"
})

See: docs/structured-logging.md for Python logging setup and patterns


2. Debugging Techniques

Interactive debugging with pdb/ipdb and effective debugging strategies.

Tools:

  • Print debugging - Quick and simple
  • pdb - Python's built-in debugger
  • ipdb - IPython-enhanced debugger
  • Post-mortem debugging - Debug after crash

pdb Commands:

  • n (next) - Execute current line
  • s (step) - Step into function
  • c (continue) - Continue execution
  • p variable - Print variable value
  • l - List source code
  • q - Quit debugger

Example:

import pdb; pdb.set_trace()  # Debugger starts here

See: docs/debugging.md for interactive debugging patterns


3. Profiling

CPU and memory profiling to identify performance bottlenecks.

Tools:

  • cProfile - CPU profiling (built-in)
  • line_profiler - Line-by-line CPU profiling
  • memory_profiler - Memory usage analysis
  • py-spy - Sampling profiler (no code changes)

cProfile Example:

python -m cProfile -s cumulative script.py

Profile Decorator:

import cProfile
import pstats

def profile(func):
    def wrapper(*args, **kwargs):
        profiler = cProfile.Profile()
        profiler.enable()
        result = func(*args, **kwargs)
        profiler.disable()
        stats = pstats.Stats(profiler)
        stats.sort_stats('cumulative')
        stats.print_stats(10)  # Top 10 functions
        return result
    return wrapper

@profile
def slow_function():
    # Your code here
    pass

See: docs/profiling.md for comprehensive profiling techniques


4. Monitoring & Metrics

Performance monitoring, timing decorators, and simple metrics.

Timing Patterns:

  • Timing decorator - Measure function execution time
  • Context manager timer - Measure code block duration
  • Performance assertions - Fail if too slow

Simple Metrics:

  • Counters - Track event occurrences
  • Histograms - Track value distributions

Example:

import time
from functools import wraps

def timer(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        duration = time.time() - start
        print(f"{func.__name__} took {duration:.2f}s")
        return result
    return wrapper

@timer
def process_data():
    # Your code here
    pass

See: docs/monitoring-metrics.md for stack traces, timers, and metrics


5. Best Practices & Anti-Patterns

Debugging strategies and logging anti-patterns to avoid.

Debugging Best Practices:

  1. Binary Search Debugging - Narrow down the problem area
  2. Rubber Duck Debugging - Explain the problem to someone (or something)
  3. Add Assertions - Catch bugs early
  4. Simplify and Isolate - Reproduce with minimal code

Logging Anti-Patterns to Avoid:

  • Logging sensitive data (passwords, tokens)
  • Logging in loops (use counters instead)
  • No context in error logs
  • Inconsistent log formats
  • Too verbose logging (noise)

See: docs/best-practices-antipatterns.md for detailed strategies


Quick Reference

Tool Use Case Details
Structured Logging Production logs docs/structured-logging.md
pdb/ipdb Interactive debugging docs/debugging.md
cProfile CPU profiling docs/profiling.md
line_profiler Line-by-line profiling docs/profiling.md
memory_profiler Memory analysis docs/profiling.md
Timer decorator Function timing docs/monitoring-metrics.md
Context timer Code block timing docs/monitoring-metrics.md

Logging Cheat Sheet

import logging

# Setup
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

# Usage
logger.debug("Debug message")       # Detailed diagnostic
logger.info("Info message")         # General information
logger.warning("Warning message")   # Warning (recoverable)
logger.error("Error message")       # Error (handled)
logger.critical("Critical message") # Critical (unrecoverable)

# With context
logger.info("User action", extra={"user_id": 123, "action": "login"})

Debugging Cheat Sheet

# pdb
import pdb; pdb.set_trace()

# ipdb (enhanced)
import ipdb; ipdb.set_trace()

# Post-mortem (debug after crash)
import pdb, sys
try:
    # Your code
    pass
except Exception:
    pdb.post_mortem(sys.exc_info()[2])

Profiling Cheat Sheet

# CPU profiling
python -m cProfile -s cumulative script.py

# Line profiling
kernprof -l -v script.py

# Memory profiling
python -m memory_profiler script.py

# Sampling profiler (no code changes)
py-spy top --pid 12345

Progressive Disclosure

This skill uses progressive disclosure to prevent context bloat:

  • Index (this file): High-level concepts and quick reference (<500 lines)
  • Detailed docs: docs/*.md files with implementation details (loaded on-demand)

Available Documentation:

  • docs/structured-logging.md - Logging setup, levels, JSON format, best practices
  • docs/debugging.md - Print debugging, pdb/ipdb, post-mortem debugging
  • docs/profiling.md - cProfile, line_profiler, memory_profiler, py-spy
  • docs/monitoring-metrics.md - Stack traces, timing patterns, simple metrics
  • docs/best-practices-antipatterns.md - Debugging strategies and logging anti-patterns

Cross-References

Related Skills:

  • error-handling-patterns - Error handling best practices
  • python-standards - Python coding conventions
  • testing-guide - Testing and debugging strategies
  • performance-optimization - Performance tuning techniques

Related Tools:

  • Python logging - Standard library logging module
  • pdb/ipdb - Interactive debuggers
  • cProfile - CPU profiling
  • memory_profiler - Memory analysis
  • py-spy - Sampling profiler

Key Takeaways

  1. Use structured logging - JSON format for machine-readable logs
  2. Log at appropriate levels - DEBUG < INFO < WARNING < ERROR < CRITICAL
  3. Include context - Add metadata to logs (user_id, request_id, etc.)
  4. Don't log sensitive data - Passwords, tokens, PII
  5. Use pdb/ipdb for debugging - Interactive debugging is powerful
  6. Profile before optimizing - Measure to find real bottlenecks
  7. Use cProfile for CPU profiling - Identify slow functions
  8. Use line_profiler for line-level profiling - Fine-grained analysis
  9. Use memory_profiler for memory leaks - Track memory usage
  10. Time critical sections - Decorator or context manager
  11. Binary search debugging - Narrow down problem area
  12. Simplify and isolate - Reproduce with minimal code

Hard Rules

FORBIDDEN:

  • Logging sensitive data (passwords, tokens, API keys) at any level
  • Using print() for production logging (MUST use structured logging)
  • Swallowing exceptions silently without logging
  • except Exception: (or except Exception as e:) without a subsequent raise or logging.exception()/logger.error(..., exc_info=True)
  • Bare except: pass — discards exception with zero handling
  • except Exception: pass — syntactically explicit but semantically identical to bare except: pass
  • contextlib.suppress() wrapping error-critical operations without inline justification comment
  • finally blocks that contain return, break, or continue — these suppress any pending exception from the try body

REQUIRED (compliant exception handling MUST use at least one of):

  • Re-raise: After logging, call raise (bare) or raise NewError(...) from original_exc to propagate the exception
  • Log with exc_info: logger.error("Operation failed", exc_info=True) or logging.exception("Operation failed") — preserves full stack trace without suppressing
  • contextlib.suppress() with justification: Acceptable ONLY for genuinely non-critical cleanup operations; MUST include an inline comment explaining why suppression is safe
# COMPLIANT: re-raise after logging
try:
    process(data)
except ValueError as exc:
    logger.error("Invalid data: %s", exc, exc_info=True)
    raise

# COMPLIANT: log with exc_info (caller gets full stack trace in logs)
try:
    send_metric(value)
except ExternalServiceError:
    logger.exception("Metric send failed — continuing without metric")

# COMPLIANT: contextlib.suppress with justification
with contextlib.suppress(FileNotFoundError):
    # Optional cache file; absence is expected on first run
    cache_path.unlink()

# NON-COMPLIANT: silent swallow
try:
    critical_operation()
except Exception:
    pass  # FORBIDDEN

# NON-COMPLIANT: log without exc_info and without re-raise
try:
    critical_operation()
except Exception as e:
    logger.error("Failed: %s", e)  # FORBIDDEN — no stack trace, exception swallowed