Skip to main content
Generalmotherduckdb

motherduck-load-data

Load data into MotherDuck from local files, object storage, HTTPS, dataframes, or external databases. Use when choosing a MotherDuck-specific ingestion path, especially CTAS and INSERT...SELECT, bulk loading, secrets, and Postgres-endpoint versus DuckDB-client tradeoffs.

Stars
28
Source
motherduckdb/agent-skills
Updated
2026-05-26
Slug
motherduckdb--agent-skills--motherduck-load-data
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/motherduckdb/agent-skills/HEAD/plugins/motherduck-skills-claude/skills/motherduck-load-data/SKILL.md -o .claude/skills/motherduck-load-data.md

Drops the SKILL.md into .claude/skills/motherduck-load-data.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

Load Data into MotherDuck

Use this skill when the job is getting data into MotherDuck correctly and efficiently, not just writing one ad hoc import query.

Source Of Truth

  • Prefer current MotherDuck loading, cloud-storage, and Postgres-endpoint loading docs first.
  • Use CREATE SECRET and cloud-storage docs for protected-object-store workflows.
  • Use the DuckDB database upload docs when the source is an existing local .duckdb, .ddb, or attached DuckDB database.
  • Keep the loading advice aligned with MotherDuck's documented posture:
    • batch over streaming
    • Parquet over CSV when you control the format
    • dataframe, COPY, CTAS, or INSERT ... SELECT over row-by-row inserts
    • native MotherDuck storage first unless DuckLake is explicitly required

Default Posture

  • Start by classifying the source: object storage or HTTPS, local file or local DuckDB, in-memory rows, or an external database.
  • Prefer CREATE TABLE AS SELECT for first loads and INSERT INTO ... SELECT for appends.
  • For whole local DuckDB databases, use CREATE OR REPLACE DATABASE remote_name FROM CURRENT_DATABASE(), an attached local database, or a file path from a native DuckDB client after attaching md:.
  • Use Parquet for durable bulk movement whenever you control the source format.
  • Treat the Postgres endpoint as a thin-client path for server-side remote reads, not for local-file or extension-driven ingestion.
  • Bootstrap the target MotherDuck database first when the ingestion tool does not create it automatically.
  • Keep raw landing minimally transformed; do typing, deduplication, and business logic in staging or modeling steps.
  • Keep source storage close to the MotherDuck region when you control placement.

Workflow

  1. Identify where the source data actually lives.
  2. Choose the loading path:
    • object storage or HTTPS: remote read into MotherDuck
    • local file or local DuckDB: use a DuckDB client path
    • in-memory rows: Arrow or dataframe bulk load first, batched inserts only as a fallback
    • external database: use the appropriate scan or replication path from a DuckDB-capable environment
  3. Land the data into a raw or staging table with minimal transformation.
  4. Validate row counts, types, and a few business aggregates immediately after the load.
  5. Promote into modeled tables only after the landing step is correct.

Open Next

  • references/INGESTION_PATTERNS.md for format-specific options, cloud-storage secrets, Postgres-endpoint loading tradeoffs, Python dataframe paths, and advanced ingestion patterns

Related Skills

  • motherduck-connect for choosing between the Postgres endpoint and a DuckDB client path
  • motherduck-explore for inspecting destination databases and validating landed tables
  • motherduck-query for writing CTAS, append, and validation SQL
  • motherduck-model-data for promoting landed data into staging and analytics tables
  • motherduck-ducklake only when object-storage-backed lakehouse storage is an explicit requirement