Pattern Extraction Skill

When to Use This Skill

Use pattern extraction when you need to:

Analyze existing infrastructure to create reusable templates
Convert hardcoded values into configurable template variables
Identify configuration patterns across multiple similar files
Detect technology stacks and their conventions
Extract naming conventions and structural patterns
Generate template metadata from existing code
Create variable schemas from inferred types and constraints

Perfect for:

Converting existing IaC to templates
Building template libraries from production code
Standardizing infrastructure patterns
Automating template generation
Identifying refactoring opportunities

Core Capabilities

1. Technology Stack Detection

Automatically identify:

IaC Frameworks: Terraform, Pulumi, CloudFormation, ARM, Bicep
Cloud Providers: AWS, Azure, GCP, multi-cloud patterns
Service Types: Kubernetes, Docker, serverless, containers
Configuration Formats: YAML, JSON, HCL, TOML
Build Systems: Helm, Kustomize, Jsonnet
CI/CD Platforms: GitHub Actions, GitLab CI, Jenkins, Harness

2. Configuration Pattern Extraction

Extract patterns from:

Resource definitions and relationships
Environment-specific configurations
Naming conventions and tagging strategies
Security policies and compliance rules
Network topologies and architectures
Deployment strategies and workflows

3. Variable Identification

Intelligent detection of:

Hardcoded Values: Strings, numbers, booleans that should be variables
Repeated Values: Values appearing multiple times across files
Environment Indicators: dev, staging, prod patterns
Naming Patterns: Prefixes, suffixes, delimiters
Secret Patterns: API keys, passwords, tokens (flag for security)
Configuration Schemas: Type inference from usage

4. Structure Analysis

Analyze:

File organization and directory structure
Module boundaries and dependencies
Resource hierarchies and relationships
Configuration inheritance patterns
Composition and reuse strategies
Template inclusion patterns

Pattern Detection Matrix

Pattern Type	Indicators	Extraction Method	Output Format
Environment Values	`dev`, `staging`, `prod` in names	Context-aware regex	`{{ environment }}`
Resource Names	Repeated prefixes/suffixes	Token analysis	`{{ project_name }}-{{ resource_type }}`
Region/Location	`us-east-1`, `westeurope`	Cloud provider patterns	`{{ region }}`
Version Numbers	Semantic versioning patterns	Regex + validation	`{{ version }}`
Port Numbers	Common service ports	Port range analysis	`{{ port }}`
Size/Scale	Instance types, node counts	Capacity patterns	`{{ instance_size }}`
CIDR Blocks	IP address ranges	Network pattern analysis	`{{ cidr_block }}`
Tags/Labels	Key-value metadata	Metadata extraction	`{{ tags }}`
Secret References	Vault paths, secret names	Secret pattern detection	`{{ secret_ref }}` (secure)
Feature Flags	Boolean toggles	Conditional analysis	`{{ enable_feature }}`

Variable Inference Rules

Before/After Transformation Examples

Example 1: Resource Names

Before:

resource "aws_instance" "web_server" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.medium"

  tags = {
    Name        = "myapp-web-server-prod"
    Environment = "production"
    Project     = "myapp"
  }
}

After:

resource "aws_instance" "web_server" {
  ami           = "{{ ami_id }}"
  instance_type = "{{ instance_type }}"

  tags = {
    Name        = "{{ project_name }}-web-server-{{ environment }}"
    Environment = "{{ environment }}"
    Project     = "{{ project_name }}"
  }
}

Extracted Variables:

variables:
  ami_id:
    type: string
    description: "AMI ID for the EC2 instance"
    default: "ami-0c55b159cbfafe1f0"
    pattern: "^ami-[a-f0-9]{17}$"

  instance_type:
    type: string
    description: "EC2 instance type"
    default: "t3.medium"
    allowed_values: ["t3.micro", "t3.small", "t3.medium", "t3.large"]

  project_name:
    type: string
    description: "Project identifier used in resource naming"
    default: "myapp"
    pattern: "^[a-z][a-z0-9-]{2,30}$"

  environment:
    type: string
    description: "Deployment environment"
    default: "production"
    allowed_values: ["dev", "staging", "production"]

Example 2: Kubernetes Deployment

Before:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
        version: "1.21.0"
    spec:
      containers:
      - name: nginx
        image: nginx:1.21.0
        ports:
        - containerPort: 80
        resources:
          requests:
            memory: "128Mi"
            cpu: "250m"
          limits:
            memory: "256Mi"
            cpu: "500m"

After:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ app_name }}-deployment
  namespace: {{ namespace }}
spec:
  replicas: {{ replica_count }}
  selector:
    matchLabels:
      app: {{ app_name }}
  template:
    metadata:
      labels:
        app: {{ app_name }}
        version: "{{ app_version }}"
    spec:
      containers:
      - name: {{ app_name }}
        image: {{ container_image }}:{{ app_version }}
        ports:
        - containerPort: {{ container_port }}
        resources:
          requests:
            memory: "{{ memory_request }}"
            cpu: "{{ cpu_request }}"
          limits:
            memory: "{{ memory_limit }}"
            cpu: "{{ cpu_limit }}"

Extracted Variables:

variables:
  app_name:
    type: string
    description: "Application name"
    default: "nginx"

  namespace:
    type: string
    description: "Kubernetes namespace"
    default: "production"

  replica_count:
    type: integer
    description: "Number of pod replicas"
    default: 3
    min: 1
    max: 10

  app_version:
    type: string
    description: "Application version"
    default: "1.21.0"
    pattern: "^\\d+\\.\\d+\\.\\d+$"

  container_image:
    type: string
    description: "Container image name"
    default: "nginx"

  container_port:
    type: integer
    description: "Container port"
    default: 80

  memory_request:
    type: string
    description: "Memory request"
    default: "128Mi"

  memory_limit:
    type: string
    description: "Memory limit"
    default: "256Mi"

  cpu_request:
    type: string
    description: "CPU request"
    default: "250m"

  cpu_limit:
    type: string
    description: "CPU limit"
    default: "500m"

Example 3: Azure Bicep

Before:

resource storageAccount 'Microsoft.Storage/storageAccounts@2021-04-01' = {
  name: 'mystorageacct12345'
  location: 'westeurope'
  sku: {
    name: 'Standard_LRS'
  }
  kind: 'StorageV2'
  properties: {
    accessTier: 'Hot'
    minimumTlsVersion: 'TLS1_2'
    supportsHttpsTrafficOnly: true
    allowBlobPublicAccess: false
  }
  tags: {
    environment: 'production'
    costCenter: 'engineering'
    project: 'platform'
  }
}

After:

resource storageAccount 'Microsoft.Storage/storageAccounts@2021-04-01' = {
  name: '{{ storage_account_name }}'
  location: '{{ location }}'
  sku: {
    name: '{{ sku_name }}'
  }
  kind: 'StorageV2'
  properties: {
    accessTier: '{{ access_tier }}'
    minimumTlsVersion: '{{ min_tls_version }}'
    supportsHttpsTrafficOnly: {{ https_only }}
    allowBlobPublicAccess: {{ allow_public_access }}
  }
  tags: {
    environment: '{{ environment }}'
    costCenter: '{{ cost_center }}'
    project: '{{ project_name }}'
  }
}

Extracted Variables:

variables:
  storage_account_name:
    type: string
    description: "Storage account name (globally unique)"
    default: "mystorageacct12345"
    pattern: "^[a-z0-9]{3,24}$"

  location:
    type: string
    description: "Azure region"
    default: "westeurope"
    allowed_values: ["westeurope", "northeurope", "eastus", "westus"]

  sku_name:
    type: string
    description: "Storage account SKU"
    default: "Standard_LRS"
    allowed_values: ["Standard_LRS", "Standard_GRS", "Premium_LRS"]

  access_tier:
    type: string
    description: "Storage access tier"
    default: "Hot"
    allowed_values: ["Hot", "Cool"]

  min_tls_version:
    type: string
    description: "Minimum TLS version"
    default: "TLS1_2"

  https_only:
    type: boolean
    description: "Require HTTPS traffic only"
    default: true

  allow_public_access:
    type: boolean
    description: "Allow public blob access"
    default: false

  environment:
    type: string
    description: "Environment name"
    default: "production"

  cost_center:
    type: string
    description: "Cost center for billing"
    default: "engineering"

  project_name:
    type: string
    description: "Project name"
    default: "platform"

Decision Tree

Pattern classification flowchart:

START: Analyze value/pattern
    |
    v
┌───────────────────────────────────┐
│ Is it repeated across files?      │
└─────┬─────────────────────┬───────┘
      │ YES                 │ NO
      v                     v
┌─────────────┐      ┌──────────────┐
│ Global Var  │      │ Check Usage  │
└─────────────┘      └──────┬───────┘
                            │
                            v
                     ┌──────────────────────┐
                     │ Used in conditionals? │
                     └──┬─────────────────┬──┘
                        │ YES             │ NO
                        v                 v
                 ┌──────────┐      ┌──────────────┐
                 │ Feature  │      │ Check Type   │
                 │ Flag     │      └──────┬───────┘
                 └──────────┘             │
                                          v
                                   ┌──────────────────────┐
                                   │ Contains 'env'?      │
                                   └──┬───────────────┬───┘
                                      │ YES           │ NO
                                      v               v
                                ┌───────────┐   ┌──────────────┐
                                │ Env Var   │   │ Check Format │
                                └───────────┘   └──────┬───────┘
                                                       │
                                                       v
                                                ┌──────────────────┐
                                                │ Cloud resource?  │
                                                └──┬───────────┬───┘
                                                   │ YES       │ NO
                                                   v           v
                                            ┌──────────┐  ┌─────────┐
                                            │ Resource │  │ Generic │
                                            │ ID       │  │ Config  │
                                            └──────────┘  └─────────┘

Examples

Example 1: Extract Terraform AWS Pattern

Input:

extract patterns from ./terraform/aws/ec2-instances.tf

Analysis:

Analyzing: terraform/aws/ec2-instances.tf
├── Technology: Terraform (AWS Provider)
├── Resources Found: 3 (aws_instance, aws_security_group, aws_eip)
├── Variables Identified: 12
└── Patterns Detected:
    ├── Naming Convention: {project}-{resource}-{env}
    ├── Tag Strategy: Environment, Project, ManagedBy
    └── Configuration Reuse: 85% similarity across instances

Extracted Template:

# Template: aws-ec2-instance.tf.tmpl
variable "project_name" {
  type        = string
  description = "Project identifier"
}

variable "environment" {
  type        = string
  description = "Deployment environment"
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}

variable "instance_config" {
  type = object({
    ami           = string
    instance_type = string
    key_name      = string
    subnet_id     = string
  })
  description = "EC2 instance configuration"
}

resource "aws_instance" "main" {
  ami           = var.instance_config.ami
  instance_type = var.instance_config.instance_type
  key_name      = var.instance_config.key_name
  subnet_id     = var.instance_config.subnet_id

  vpc_security_group_ids = [aws_security_group.main.id]

  tags = {
    Name        = "${var.project_name}-instance-${var.environment}"
    Environment = var.environment
    Project     = var.project_name
    ManagedBy   = "Terraform"
  }
}

resource "aws_security_group" "main" {
  name        = "${var.project_name}-sg-${var.environment}"
  description = "Security group for ${var.project_name} in ${var.environment}"

  tags = {
    Name        = "${var.project_name}-sg-${var.environment}"
    Environment = var.environment
    Project     = var.project_name
  }
}

Example 2: Extract Kubernetes Pattern

Input:

extract patterns from ./k8s/deployments/ --type kubernetes

Analysis:

Analyzing: k8s/deployments/ (15 files)
├── Technology: Kubernetes (v1.24+)
├── Resources Found: 45 total
│   ├── Deployments: 15
│   ├── Services: 12
│   ├── ConfigMaps: 10
│   └── Ingresses: 8
├── Common Patterns:
│   ├── Label Strategy: app, version, environment
│   ├── Resource Requests: 90% use standard sizes
│   ├── Probes: 100% have health checks
│   └── Image Pattern: registry/org/image:tag
└── Variable Candidates: 28 identified

Extracted Template:

# Template: kubernetes-microservice.yaml.tmpl
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ service_name }}-deployment
  namespace: {{ namespace }}
  labels:
    app: {{ service_name }}
    version: {{ version }}
    environment: {{ environment }}
spec:
  replicas: {{ replica_count }}
  selector:
    matchLabels:
      app: {{ service_name }}
  template:
    metadata:
      labels:
        app: {{ service_name }}
        version: {{ version }}
        environment: {{ environment }}
    spec:
      containers:
      - name: {{ service_name }}
        image: {{ image_registry }}/{{ organization }}/{{ service_name }}:{{ version }}
        ports:
        - containerPort: {{ container_port }}
          protocol: TCP
        env:
        {{#each environment_variables}}
        - name: {{ name }}
          value: "{{ value }}"
        {{/each}}
        resources:
          requests:
            memory: {{ memory_request }}
            cpu: {{ cpu_request }}
          limits:
            memory: {{ memory_limit }}
            cpu: {{ cpu_limit }}
        livenessProbe:
          httpGet:
            path: {{ health_check_path }}
            port: {{ container_port }}
          initialDelaySeconds: {{ liveness_initial_delay }}
          periodSeconds: {{ liveness_period }}
        readinessProbe:
          httpGet:
            path: {{ readiness_check_path }}
            port: {{ container_port }}
          initialDelaySeconds: {{ readiness_initial_delay }}
          periodSeconds: {{ readiness_period }}
---
apiVersion: v1
kind: Service
metadata:
  name: {{ service_name }}-service
  namespace: {{ namespace }}
  labels:
    app: {{ service_name }}
spec:
  type: {{ service_type }}
  ports:
  - port: {{ service_port }}
    targetPort: {{ container_port }}
    protocol: TCP
  selector:
    app: {{ service_name }}

Example 3: Multi-File Pattern Extraction

Input:

extract patterns from ./infrastructure/ --recursive --consolidate

Analysis:

Analyzing: infrastructure/ (recursive)
├── Files Scanned: 47
├── Technologies Detected:
│   ├── Terraform (AWS): 23 files
│   ├── Kubernetes: 15 files
│   ├── Helm Charts: 9 files
│   └── Docker Compose: 2 files (excluded - different pattern)
├── Cross-Cutting Patterns:
│   ├── Environment Strategy: 3-tier (dev/staging/prod)
│   ├── Tagging: 100% compliance with org policy
│   ├── Naming: Consistent kebab-case with env suffix
│   └── Secrets: HashiCorp Vault references
└── Template Opportunities:
    ├── AWS Lambda Function: 8 similar resources
    ├── RDS Database: 5 similar resources
    ├── Kubernetes Service: 12 similar resources
    └── ALB Configuration: 6 similar resources

Output:

Generated Templates:
├── templates/
│   ├── aws-lambda-function.tf.tmpl (consolidated from 8 files)
│   ├── aws-rds-instance.tf.tmpl (consolidated from 5 files)
│   ├── kubernetes-service.yaml.tmpl (consolidated from 12 files)
│   └── aws-alb.tf.tmpl (consolidated from 6 files)
└── variables/
    ├── common.yaml (shared across all templates)
    ├── aws-specific.yaml (AWS provider variables)
    └── kubernetes-specific.yaml (K8s variables)

Variable Reuse Analysis:
├── Shared Variables: 15 (45% reuse)
├── Template-Specific: 18 (55% unique)
└── Potential Consolidation: 3 variables can be merged

Best Practices

1. Incremental Extraction

Start with a single file or small directory to refine patterns before scaling:

# Start small
extract patterns from ./terraform/main.tf

# Validate and adjust
review template ./templates/main.tf.tmpl

# Scale up
extract patterns from ./terraform/ --recursive

2. Variable Naming Conventions

Follow consistent naming:

Use snake_case for variables
Prefix with context: aws_, k8s_, azure_
Suffix with type: _count, _enabled, _config
Be descriptive: instance_type not type

3. Type Inference Priority

Explicit types from existing variable definitions
Usage patterns (e.g., used in math → number)
Value format (e.g., "true"/"false" → boolean)
Defaults to string if ambiguous

4. Security-Sensitive Patterns

Always flag potential secrets:

# Good - flagged for security review
api_key:
  type: string
  sensitive: true
  description: "API key for external service"
  default: null  # Force explicit value

5. Validation Rules

Add validation for critical variables:

environment:
  type: string
  allowed_values: ["dev", "staging", "prod"]

cidr_block:
  type: string
  pattern: "^\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}/\\d{1,2}$"

replica_count:
  type: integer
  min: 1
  max: 100

6. Documentation Generation

Auto-generate docs from extracted patterns:

# Generated Template: aws-ec2-instance

## Description
Extracted from 8 similar EC2 instance definitions.
Pattern confidence: 95%

## Variables
| Name | Type | Required | Default | Description |
|------|------|----------|---------|-------------|
| instance_type | string | Yes | t3.medium | EC2 instance type |
| environment | string | Yes | - | Deployment environment |

## Usage
terraform apply -var="instance_type=t3.large" -var="environment=prod"

7. Pattern Confidence Scoring

Rate extraction confidence:

High (90-100%): Identical patterns across files
Medium (70-89%): Similar with minor variations
Low (<70%): Significant differences, manual review needed

8. Iterative Refinement

# Extract initial patterns
extract patterns from ./infra/ --output ./templates/v1/

# Review and refine
review templates ./templates/v1/ --suggest-improvements

# Re-extract with refinements
extract patterns from ./infra/ --output ./templates/v2/ \
  --naming-convention kebab-case \
  --tag-strategy company-standard

Related Skills

[[template-generation]] - Generate templates from extracted patterns
[[variable-schema-design]] - Design robust variable schemas
[[template-validation]] - Validate generated templates
[[naming-convention-analyzer]] - Analyze and enforce naming conventions
[[configuration-consolidation]] - Merge similar configurations
[[security-pattern-detection]] - Identify security anti-patterns
[[compliance-checking]] - Ensure extracted patterns meet compliance requirements

Advanced Techniques

Multi-Stage Extraction Pipeline

# Stage 1: Initial scan
extract patterns from ./infra/ --stage scan

# Stage 2: Pattern classification
extract patterns from ./infra/ --stage classify

# Stage 3: Variable inference
extract patterns from ./infra/ --stage variables

# Stage 4: Template generation
extract patterns from ./infra/ --stage generate

# Stage 5: Validation
extract patterns from ./infra/ --stage validate

Machine Learning-Enhanced Extraction

Use ML models to improve pattern detection:

Clustering: Group similar configurations
Anomaly Detection: Identify outliers for manual review
Type Inference: Predict variable types from usage
Dependency Analysis: Extract implicit dependencies

Cross-Repository Pattern Mining

Extract patterns across multiple repositories:

extract patterns --repos-file ./repos.txt \
  --output ./org-templates/ \
  --consolidate-org-wide

Output Formats

JSON Schema

{
  "template": "aws-ec2-instance",
  "version": "1.0.0",
  "variables": {
    "instance_type": {
      "type": "string",
      "default": "t3.medium",
      "description": "EC2 instance type"
    }
  }
}

YAML Schema

template: aws-ec2-instance
version: 1.0.0
variables:
  instance_type:
    type: string
    default: t3.medium
    description: EC2 instance type

Terraform Variable Definitions

variable "instance_type" {
  type        = string
  default     = "t3.medium"
  description = "EC2 instance type"
}

Integration Points

CI/CD Pipelines: Auto-extract patterns on code changes
Template Registries: Publish extracted templates
Documentation Systems: Generate docs from patterns
Monitoring: Track pattern usage and evolution
Compliance Tools: Validate against org standards

Troubleshooting

Pattern Detection Issues

Problem: Too many false positives Solution: Increase confidence threshold, add exclusion patterns

Problem: Missing obvious patterns Solution: Check file encoding, adjust regex patterns, enable debug mode

Problem: Inconsistent variable names Solution: Enable smart naming normalization, use naming dictionary

Template Generation Issues

Problem: Templates too generic Solution: Use narrower extraction scope, increase specificity

Problem: Templates too specific Solution: Widen extraction scope, enable cross-file consolidation

Version: 1.0.0 Last Updated: 2026-01-19 Skill Type: Analysis & Extraction Complexity: Advanced