Load Data into MotherDuck
Use this skill when the job is getting data into MotherDuck correctly and efficiently, not just writing one ad hoc import query.
Source Of Truth
- Prefer current MotherDuck loading, cloud-storage, and Postgres-endpoint loading docs first.
- Use
CREATE SECRETand cloud-storage docs for protected-object-store workflows. - Use the DuckDB database upload docs when the source is an existing local
.duckdb,.ddb, or attached DuckDB database. - Keep the loading advice aligned with MotherDuck's documented posture:
- batch over streaming
- Parquet over CSV when you control the format
- dataframe,
COPY, CTAS, orINSERT ... SELECTover row-by-row inserts - native MotherDuck storage first unless DuckLake is explicitly required
Default Posture
- Start by classifying the source: object storage or HTTPS, local file or local DuckDB, in-memory rows, or an external database.
- Prefer
CREATE TABLE AS SELECTfor first loads andINSERT INTO ... SELECTfor appends. - For whole local DuckDB databases, use
CREATE OR REPLACE DATABASE remote_name FROM CURRENT_DATABASE(), an attached local database, or a file path from a native DuckDB client after attachingmd:. - Use Parquet for durable bulk movement whenever you control the source format.
- Treat the Postgres endpoint as a thin-client path for server-side remote reads, not for local-file or extension-driven ingestion.
- Bootstrap the target MotherDuck database first when the ingestion tool does not create it automatically.
- Keep raw landing minimally transformed; do typing, deduplication, and business logic in staging or modeling steps.
- Keep source storage close to the MotherDuck region when you control placement.
Workflow
- Identify where the source data actually lives.
- Choose the loading path:
- object storage or HTTPS: remote read into MotherDuck
- local file or local DuckDB: use a DuckDB client path
- in-memory rows: Arrow or dataframe bulk load first, batched inserts only as a fallback
- external database: use the appropriate scan or replication path from a DuckDB-capable environment
- Land the data into a raw or staging table with minimal transformation.
- Validate row counts, types, and a few business aggregates immediately after the load.
- Promote into modeled tables only after the landing step is correct.
Open Next
references/INGESTION_PATTERNS.mdfor format-specific options, cloud-storage secrets, Postgres-endpoint loading tradeoffs, Python dataframe paths, and advanced ingestion patterns
Related Skills
motherduck-connectfor choosing between the Postgres endpoint and a DuckDB client pathmotherduck-explorefor inspecting destination databases and validating landed tablesmotherduck-queryfor writing CTAS, append, and validation SQLmotherduck-model-datafor promoting landed data into staging and analytics tablesmotherduck-ducklakeonly when object-storage-backed lakehouse storage is an explicit requirement