Skip to main content
Databasefusengine

laravel-vector-search

Use when implementing semantic / vector search in Laravel 13 with PostgreSQL + pgvector. Covers schema setup, embedding workflow, and the new query builder methods (`whereVectorSimilarTo`, `selectVectorDistance`, etc.).

Stars
13
Source
fusengine/agents
Updated
2026-05-17
Slug
fusengine--agents--laravel-vector-search
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/fusengine/agents/HEAD/plugins/laravel-expert/skills/laravel-vector-search/SKILL.md -o .claude/skills/laravel-vector-search.md

Drops the SKILL.md into .claude/skills/laravel-vector-search.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

Laravel 13 Vector Search (pgvector)

Agent Workflow (MANDATORY)

Before ANY implementation, use TeamCreate to spawn 3 agents:

  1. fuse-ai-pilot:explore-codebase - Check current DB driver (must be PostgreSQL) and existing embedding columns
  2. fuse-ai-pilot:research-expert - Verify pgvector extension version and HNSW vs IVFFlat tradeoffs
  3. mcp__context7__query-docs - Pull laravel.com/docs/13.x/search + queries examples

After implementation, run fuse-ai-pilot:sniper for validation.


Overview

Feature Description
PostgreSQL only Requires pgvector extension; not available on MySQL/SQLite
Schema helper Schema::ensureVectorExtensionExists() enables the extension
Query builder whereVectorSimilarTo(), selectVectorDistance(), whereVectorDistanceLessThan(), orderByVectorDistance()
Auto-embedding Pass a raw string and Laravel generates the embedding via AI SDK
Cosine similarity Default distance; threshold via minSimilarity (0.0 - 1.0)

Critical Rules

  1. Use PostgreSQL - Vector clauses ONLY work on pgsql connections - no fallback to MySQL/SQLite
  2. Create an HNSW index - Without an index, queries do full table scans; > 10k rows means seconds-to-minutes latency
  3. Match dimensions exactly - Insert-time and query-time embedding models MUST share the same dimensions
  4. Cache embeddings - Regenerating embeddings on every request is the #1 cost driver; persist them
  5. Lock the embedding model - Changing the model invalidates ALL stored embeddings; treat the model as a schema field

Architecture

database/migrations/
└── XXXX_create_documents_table.php   # Schema::ensureVectorExtensionExists(), vector(1536) col, HNSW index

app/Models/
└── Document.php                       # casts embedding to array, uses whereVectorSimilarTo

app/Ai/Services/
└── VectorSearchService.php            # encapsulates query + threshold logic

→ See Document-model.php.md for full example


Reference Guide

Topic Reference When to Consult
pgvector setup pgvector-setup.md Migrations + index creation
Embedding workflow embeddings-workflow.md Generating + persisting vectors
Query patterns queries.md whereVectorSimilarTo and friends

Templates

Template When to Use
Document-model.php.md Eloquent model with vector column
VectorSearchService.php.md Reusable service

Quick Reference

Migration

Schema::ensureVectorExtensionExists();

Schema::create('documents', function (Blueprint $table) {
    $table->id();
    $table->text('content');
    $table->vector('embedding', 1536);
    $table->timestamps();
    $table->vectorIndex('embedding', algorithm: 'hnsw');
});

Query

$documents = Document::query()
    ->whereVectorSimilarTo('embedding', 'best wineries in Napa Valley', minSimilarity: 0.4)
    ->limit(10)
    ->get();

→ See VectorSearchService.php.md for complete example


Best Practices

DO

  • Create an HNSW index BEFORE inserting bulk data - faster total ingest
  • Store the embedding model name alongside the vector to detect drift
  • Use minSimilarity 0.3-0.5 as a starting threshold; tune empirically
  • Combine vector search with classic where() for hybrid filtering (date ranges, tenancy)

DON'T

  • Don't run vector queries without an index past a few thousand rows - it becomes a full table scan
  • Don't mix embedding models in the same column - distances become meaningless
  • Don't generate query embeddings inside loops - batch them via Embeddings::for([...])
  • Don't store embeddings as JSON strings - use the native vector column type for index support