Pattern Extraction Skill
When to Use This Skill
Use pattern extraction when you need to:
- Analyze existing infrastructure to create reusable templates
- Convert hardcoded values into configurable template variables
- Identify configuration patterns across multiple similar files
- Detect technology stacks and their conventions
- Extract naming conventions and structural patterns
- Generate template metadata from existing code
- Create variable schemas from inferred types and constraints
Perfect for:
- Converting existing IaC to templates
- Building template libraries from production code
- Standardizing infrastructure patterns
- Automating template generation
- Identifying refactoring opportunities
Core Capabilities
1. Technology Stack Detection
Automatically identify:
- IaC Frameworks: Terraform, Pulumi, CloudFormation, ARM, Bicep
- Cloud Providers: AWS, Azure, GCP, multi-cloud patterns
- Service Types: Kubernetes, Docker, serverless, containers
- Configuration Formats: YAML, JSON, HCL, TOML
- Build Systems: Helm, Kustomize, Jsonnet
- CI/CD Platforms: GitHub Actions, GitLab CI, Jenkins, Harness
2. Configuration Pattern Extraction
Extract patterns from:
- Resource definitions and relationships
- Environment-specific configurations
- Naming conventions and tagging strategies
- Security policies and compliance rules
- Network topologies and architectures
- Deployment strategies and workflows
3. Variable Identification
Intelligent detection of:
- Hardcoded Values: Strings, numbers, booleans that should be variables
- Repeated Values: Values appearing multiple times across files
- Environment Indicators: dev, staging, prod patterns
- Naming Patterns: Prefixes, suffixes, delimiters
- Secret Patterns: API keys, passwords, tokens (flag for security)
- Configuration Schemas: Type inference from usage
4. Structure Analysis
Analyze:
- File organization and directory structure
- Module boundaries and dependencies
- Resource hierarchies and relationships
- Configuration inheritance patterns
- Composition and reuse strategies
- Template inclusion patterns
Pattern Detection Matrix
| Pattern Type | Indicators | Extraction Method | Output Format |
|---|---|---|---|
| Environment Values | dev, staging, prod in names |
Context-aware regex | {{ environment }} |
| Resource Names | Repeated prefixes/suffixes | Token analysis | {{ project_name }}-{{ resource_type }} |
| Region/Location | us-east-1, westeurope |
Cloud provider patterns | {{ region }} |
| Version Numbers | Semantic versioning patterns | Regex + validation | {{ version }} |
| Port Numbers | Common service ports | Port range analysis | {{ port }} |
| Size/Scale | Instance types, node counts | Capacity patterns | {{ instance_size }} |
| CIDR Blocks | IP address ranges | Network pattern analysis | {{ cidr_block }} |
| Tags/Labels | Key-value metadata | Metadata extraction | {{ tags }} |
| Secret References | Vault paths, secret names | Secret pattern detection | {{ secret_ref }} (secure) |
| Feature Flags | Boolean toggles | Conditional analysis | {{ enable_feature }} |
Variable Inference Rules
Before/After Transformation Examples
Example 1: Resource Names
Before:
resource "aws_instance" "web_server" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.medium"
tags = {
Name = "myapp-web-server-prod"
Environment = "production"
Project = "myapp"
}
}
After:
resource "aws_instance" "web_server" {
ami = "{{ ami_id }}"
instance_type = "{{ instance_type }}"
tags = {
Name = "{{ project_name }}-web-server-{{ environment }}"
Environment = "{{ environment }}"
Project = "{{ project_name }}"
}
}
Extracted Variables:
variables:
ami_id:
type: string
description: "AMI ID for the EC2 instance"
default: "ami-0c55b159cbfafe1f0"
pattern: "^ami-[a-f0-9]{17}$"
instance_type:
type: string
description: "EC2 instance type"
default: "t3.medium"
allowed_values: ["t3.micro", "t3.small", "t3.medium", "t3.large"]
project_name:
type: string
description: "Project identifier used in resource naming"
default: "myapp"
pattern: "^[a-z][a-z0-9-]{2,30}$"
environment:
type: string
description: "Deployment environment"
default: "production"
allowed_values: ["dev", "staging", "production"]
Example 2: Kubernetes Deployment
Before:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
version: "1.21.0"
spec:
containers:
- name: nginx
image: nginx:1.21.0
ports:
- containerPort: 80
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "500m"
After:
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ app_name }}-deployment
namespace: {{ namespace }}
spec:
replicas: {{ replica_count }}
selector:
matchLabels:
app: {{ app_name }}
template:
metadata:
labels:
app: {{ app_name }}
version: "{{ app_version }}"
spec:
containers:
- name: {{ app_name }}
image: {{ container_image }}:{{ app_version }}
ports:
- containerPort: {{ container_port }}
resources:
requests:
memory: "{{ memory_request }}"
cpu: "{{ cpu_request }}"
limits:
memory: "{{ memory_limit }}"
cpu: "{{ cpu_limit }}"
Extracted Variables:
variables:
app_name:
type: string
description: "Application name"
default: "nginx"
namespace:
type: string
description: "Kubernetes namespace"
default: "production"
replica_count:
type: integer
description: "Number of pod replicas"
default: 3
min: 1
max: 10
app_version:
type: string
description: "Application version"
default: "1.21.0"
pattern: "^\\d+\\.\\d+\\.\\d+$"
container_image:
type: string
description: "Container image name"
default: "nginx"
container_port:
type: integer
description: "Container port"
default: 80
memory_request:
type: string
description: "Memory request"
default: "128Mi"
memory_limit:
type: string
description: "Memory limit"
default: "256Mi"
cpu_request:
type: string
description: "CPU request"
default: "250m"
cpu_limit:
type: string
description: "CPU limit"
default: "500m"
Example 3: Azure Bicep
Before:
resource storageAccount 'Microsoft.Storage/storageAccounts@2021-04-01' = {
name: 'mystorageacct12345'
location: 'westeurope'
sku: {
name: 'Standard_LRS'
}
kind: 'StorageV2'
properties: {
accessTier: 'Hot'
minimumTlsVersion: 'TLS1_2'
supportsHttpsTrafficOnly: true
allowBlobPublicAccess: false
}
tags: {
environment: 'production'
costCenter: 'engineering'
project: 'platform'
}
}
After:
resource storageAccount 'Microsoft.Storage/storageAccounts@2021-04-01' = {
name: '{{ storage_account_name }}'
location: '{{ location }}'
sku: {
name: '{{ sku_name }}'
}
kind: 'StorageV2'
properties: {
accessTier: '{{ access_tier }}'
minimumTlsVersion: '{{ min_tls_version }}'
supportsHttpsTrafficOnly: {{ https_only }}
allowBlobPublicAccess: {{ allow_public_access }}
}
tags: {
environment: '{{ environment }}'
costCenter: '{{ cost_center }}'
project: '{{ project_name }}'
}
}
Extracted Variables:
variables:
storage_account_name:
type: string
description: "Storage account name (globally unique)"
default: "mystorageacct12345"
pattern: "^[a-z0-9]{3,24}$"
location:
type: string
description: "Azure region"
default: "westeurope"
allowed_values: ["westeurope", "northeurope", "eastus", "westus"]
sku_name:
type: string
description: "Storage account SKU"
default: "Standard_LRS"
allowed_values: ["Standard_LRS", "Standard_GRS", "Premium_LRS"]
access_tier:
type: string
description: "Storage access tier"
default: "Hot"
allowed_values: ["Hot", "Cool"]
min_tls_version:
type: string
description: "Minimum TLS version"
default: "TLS1_2"
https_only:
type: boolean
description: "Require HTTPS traffic only"
default: true
allow_public_access:
type: boolean
description: "Allow public blob access"
default: false
environment:
type: string
description: "Environment name"
default: "production"
cost_center:
type: string
description: "Cost center for billing"
default: "engineering"
project_name:
type: string
description: "Project name"
default: "platform"
Decision Tree
Pattern classification flowchart:
START: Analyze value/pattern
|
v
┌───────────────────────────────────┐
│ Is it repeated across files? │
└─────┬─────────────────────┬───────┘
│ YES │ NO
v v
┌─────────────┐ ┌──────────────┐
│ Global Var │ │ Check Usage │
└─────────────┘ └──────┬───────┘
│
v
┌──────────────────────┐
│ Used in conditionals? │
└──┬─────────────────┬──┘
│ YES │ NO
v v
┌──────────┐ ┌──────────────┐
│ Feature │ │ Check Type │
│ Flag │ └──────┬───────┘
└──────────┘ │
v
┌──────────────────────┐
│ Contains 'env'? │
└──┬───────────────┬───┘
│ YES │ NO
v v
┌───────────┐ ┌──────────────┐
│ Env Var │ │ Check Format │
└───────────┘ └──────┬───────┘
│
v
┌──────────────────┐
│ Cloud resource? │
└──┬───────────┬───┘
│ YES │ NO
v v
┌──────────┐ ┌─────────┐
│ Resource │ │ Generic │
│ ID │ │ Config │
└──────────┘ └─────────┘
Examples
Example 1: Extract Terraform AWS Pattern
Input:
extract patterns from ./terraform/aws/ec2-instances.tf
Analysis:
Analyzing: terraform/aws/ec2-instances.tf
├── Technology: Terraform (AWS Provider)
├── Resources Found: 3 (aws_instance, aws_security_group, aws_eip)
├── Variables Identified: 12
└── Patterns Detected:
├── Naming Convention: {project}-{resource}-{env}
├── Tag Strategy: Environment, Project, ManagedBy
└── Configuration Reuse: 85% similarity across instances
Extracted Template:
# Template: aws-ec2-instance.tf.tmpl
variable "project_name" {
type = string
description = "Project identifier"
}
variable "environment" {
type = string
description = "Deployment environment"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
variable "instance_config" {
type = object({
ami = string
instance_type = string
key_name = string
subnet_id = string
})
description = "EC2 instance configuration"
}
resource "aws_instance" "main" {
ami = var.instance_config.ami
instance_type = var.instance_config.instance_type
key_name = var.instance_config.key_name
subnet_id = var.instance_config.subnet_id
vpc_security_group_ids = [aws_security_group.main.id]
tags = {
Name = "${var.project_name}-instance-${var.environment}"
Environment = var.environment
Project = var.project_name
ManagedBy = "Terraform"
}
}
resource "aws_security_group" "main" {
name = "${var.project_name}-sg-${var.environment}"
description = "Security group for ${var.project_name} in ${var.environment}"
tags = {
Name = "${var.project_name}-sg-${var.environment}"
Environment = var.environment
Project = var.project_name
}
}
Example 2: Extract Kubernetes Pattern
Input:
extract patterns from ./k8s/deployments/ --type kubernetes
Analysis:
Analyzing: k8s/deployments/ (15 files)
├── Technology: Kubernetes (v1.24+)
├── Resources Found: 45 total
│ ├── Deployments: 15
│ ├── Services: 12
│ ├── ConfigMaps: 10
│ └── Ingresses: 8
├── Common Patterns:
│ ├── Label Strategy: app, version, environment
│ ├── Resource Requests: 90% use standard sizes
│ ├── Probes: 100% have health checks
│ └── Image Pattern: registry/org/image:tag
└── Variable Candidates: 28 identified
Extracted Template:
# Template: kubernetes-microservice.yaml.tmpl
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ service_name }}-deployment
namespace: {{ namespace }}
labels:
app: {{ service_name }}
version: {{ version }}
environment: {{ environment }}
spec:
replicas: {{ replica_count }}
selector:
matchLabels:
app: {{ service_name }}
template:
metadata:
labels:
app: {{ service_name }}
version: {{ version }}
environment: {{ environment }}
spec:
containers:
- name: {{ service_name }}
image: {{ image_registry }}/{{ organization }}/{{ service_name }}:{{ version }}
ports:
- containerPort: {{ container_port }}
protocol: TCP
env:
{{#each environment_variables}}
- name: {{ name }}
value: "{{ value }}"
{{/each}}
resources:
requests:
memory: {{ memory_request }}
cpu: {{ cpu_request }}
limits:
memory: {{ memory_limit }}
cpu: {{ cpu_limit }}
livenessProbe:
httpGet:
path: {{ health_check_path }}
port: {{ container_port }}
initialDelaySeconds: {{ liveness_initial_delay }}
periodSeconds: {{ liveness_period }}
readinessProbe:
httpGet:
path: {{ readiness_check_path }}
port: {{ container_port }}
initialDelaySeconds: {{ readiness_initial_delay }}
periodSeconds: {{ readiness_period }}
---
apiVersion: v1
kind: Service
metadata:
name: {{ service_name }}-service
namespace: {{ namespace }}
labels:
app: {{ service_name }}
spec:
type: {{ service_type }}
ports:
- port: {{ service_port }}
targetPort: {{ container_port }}
protocol: TCP
selector:
app: {{ service_name }}
Example 3: Multi-File Pattern Extraction
Input:
extract patterns from ./infrastructure/ --recursive --consolidate
Analysis:
Analyzing: infrastructure/ (recursive)
├── Files Scanned: 47
├── Technologies Detected:
│ ├── Terraform (AWS): 23 files
│ ├── Kubernetes: 15 files
│ ├── Helm Charts: 9 files
│ └── Docker Compose: 2 files (excluded - different pattern)
├── Cross-Cutting Patterns:
│ ├── Environment Strategy: 3-tier (dev/staging/prod)
│ ├── Tagging: 100% compliance with org policy
│ ├── Naming: Consistent kebab-case with env suffix
│ └── Secrets: HashiCorp Vault references
└── Template Opportunities:
├── AWS Lambda Function: 8 similar resources
├── RDS Database: 5 similar resources
├── Kubernetes Service: 12 similar resources
└── ALB Configuration: 6 similar resources
Output:
Generated Templates:
├── templates/
│ ├── aws-lambda-function.tf.tmpl (consolidated from 8 files)
│ ├── aws-rds-instance.tf.tmpl (consolidated from 5 files)
│ ├── kubernetes-service.yaml.tmpl (consolidated from 12 files)
│ └── aws-alb.tf.tmpl (consolidated from 6 files)
└── variables/
├── common.yaml (shared across all templates)
├── aws-specific.yaml (AWS provider variables)
└── kubernetes-specific.yaml (K8s variables)
Variable Reuse Analysis:
├── Shared Variables: 15 (45% reuse)
├── Template-Specific: 18 (55% unique)
└── Potential Consolidation: 3 variables can be merged
Best Practices
1. Incremental Extraction
Start with a single file or small directory to refine patterns before scaling:
# Start small
extract patterns from ./terraform/main.tf
# Validate and adjust
review template ./templates/main.tf.tmpl
# Scale up
extract patterns from ./terraform/ --recursive
2. Variable Naming Conventions
Follow consistent naming:
- Use snake_case for variables
- Prefix with context:
aws_,k8s_,azure_ - Suffix with type:
_count,_enabled,_config - Be descriptive:
instance_typenottype
3. Type Inference Priority
- Explicit types from existing variable definitions
- Usage patterns (e.g., used in math → number)
- Value format (e.g., "true"/"false" → boolean)
- Defaults to string if ambiguous
4. Security-Sensitive Patterns
Always flag potential secrets:
# Good - flagged for security review
api_key:
type: string
sensitive: true
description: "API key for external service"
default: null # Force explicit value
5. Validation Rules
Add validation for critical variables:
environment:
type: string
allowed_values: ["dev", "staging", "prod"]
cidr_block:
type: string
pattern: "^\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}/\\d{1,2}$"
replica_count:
type: integer
min: 1
max: 100
6. Documentation Generation
Auto-generate docs from extracted patterns:
# Generated Template: aws-ec2-instance
## Description
Extracted from 8 similar EC2 instance definitions.
Pattern confidence: 95%
## Variables
| Name | Type | Required | Default | Description |
|------|------|----------|---------|-------------|
| instance_type | string | Yes | t3.medium | EC2 instance type |
| environment | string | Yes | - | Deployment environment |
## Usage
terraform apply -var="instance_type=t3.large" -var="environment=prod"
7. Pattern Confidence Scoring
Rate extraction confidence:
- High (90-100%): Identical patterns across files
- Medium (70-89%): Similar with minor variations
- Low (<70%): Significant differences, manual review needed
8. Iterative Refinement
# Extract initial patterns
extract patterns from ./infra/ --output ./templates/v1/
# Review and refine
review templates ./templates/v1/ --suggest-improvements
# Re-extract with refinements
extract patterns from ./infra/ --output ./templates/v2/ \
--naming-convention kebab-case \
--tag-strategy company-standard
Related Skills
- [[template-generation]] - Generate templates from extracted patterns
- [[variable-schema-design]] - Design robust variable schemas
- [[template-validation]] - Validate generated templates
- [[naming-convention-analyzer]] - Analyze and enforce naming conventions
- [[configuration-consolidation]] - Merge similar configurations
- [[security-pattern-detection]] - Identify security anti-patterns
- [[compliance-checking]] - Ensure extracted patterns meet compliance requirements
Advanced Techniques
Multi-Stage Extraction Pipeline
# Stage 1: Initial scan
extract patterns from ./infra/ --stage scan
# Stage 2: Pattern classification
extract patterns from ./infra/ --stage classify
# Stage 3: Variable inference
extract patterns from ./infra/ --stage variables
# Stage 4: Template generation
extract patterns from ./infra/ --stage generate
# Stage 5: Validation
extract patterns from ./infra/ --stage validate
Machine Learning-Enhanced Extraction
Use ML models to improve pattern detection:
- Clustering: Group similar configurations
- Anomaly Detection: Identify outliers for manual review
- Type Inference: Predict variable types from usage
- Dependency Analysis: Extract implicit dependencies
Cross-Repository Pattern Mining
Extract patterns across multiple repositories:
extract patterns --repos-file ./repos.txt \
--output ./org-templates/ \
--consolidate-org-wide
Output Formats
JSON Schema
{
"template": "aws-ec2-instance",
"version": "1.0.0",
"variables": {
"instance_type": {
"type": "string",
"default": "t3.medium",
"description": "EC2 instance type"
}
}
}
YAML Schema
template: aws-ec2-instance
version: 1.0.0
variables:
instance_type:
type: string
default: t3.medium
description: EC2 instance type
Terraform Variable Definitions
variable "instance_type" {
type = string
default = "t3.medium"
description = "EC2 instance type"
}
Integration Points
- CI/CD Pipelines: Auto-extract patterns on code changes
- Template Registries: Publish extracted templates
- Documentation Systems: Generate docs from patterns
- Monitoring: Track pattern usage and evolution
- Compliance Tools: Validate against org standards
Troubleshooting
Pattern Detection Issues
Problem: Too many false positives Solution: Increase confidence threshold, add exclusion patterns
Problem: Missing obvious patterns Solution: Check file encoding, adjust regex patterns, enable debug mode
Problem: Inconsistent variable names Solution: Enable smart naming normalization, use naming dictionary
Template Generation Issues
Problem: Templates too generic Solution: Use narrower extraction scope, increase specificity
Problem: Templates too specific Solution: Widen extraction scope, enable cross-file consolidation
Version: 1.0.0 Last Updated: 2026-01-19 Skill Type: Analysis & Extraction Complexity: Advanced