Implement a data product from its data contract
Turn an Entropy Data data product into a working dbt pipeline. The data contract (ODCS) is the source of truth for output schema; this skill reads it and writes the dbt artifacts that produce data matching the contract.
When to use this vs. other skills
- Empty directory, no dbt project yet → run
dataproduct-bootstrapfirst, then come back here. - Existing dbt project, need ODPS/ODCS/OpenLineage scaffolding only → use
entropy-data-syncinstead. - Existing dbt project, want to derive models from a published data contract → this skill.
How to run this skill
${PLUGIN_ROOT}below refers to the root of this plugin — the directory that containsskills/. On Claude Code it is set automatically as${CLAUDE_PLUGIN_ROOT}— use that. On any other agent (Codex, Copilot CLI, etc.) it is unset; resolve it as../..relative to thisSKILL.mdfile's directory (i.e. the grandparent ofskills/<this-skill>/).
Plan announcement (before Step 0)
Before running Step 0, print this plan to the user verbatim:
Running dataproduct-implement. I'll:
- Pre-checks: confirm this is a dbt project, the
dbtCLI is installed, and theentropy-dataCLI is connected.- Resolve the data product by id or URL (
entropy-data dataproducts get).- Fetch each selected output port's data contract (
entropy-data datacontracts get) and save it next to the SQL it governs, undermodels/output_ports/v<N>/.- Validate the contract against the target platform's conventions (e.g. UPPERCASE identifiers on Snowflake). If fixable bugs are found, offer to patch and publish the corrected contract back to Entropy Data.
- Translate the ODCS schema into dbt models under
models/output_ports/v1/(column list, types, tests).- Implement the dbt model bodies: declare input ports as dbt sources, cache each upstream contract under
models/input_ports/<provider-op-id>.odcs.yamlas a trust snapshot, and write theselectfrom input ports to output columns (with confirmation; complex joins left as TODOs).- Stamp the data product on Entropy Data with the
dataProductBuildercustomProperty so the platform knows it is managed by this builder.- Hand off to
entropy-data-syncto add any missing publishing artifacts (ODPS, OpenLineage, GitHub Actions).- Summarize what was generated and the open TODOs.
Then proceed.
Step 0 — Pre-checks
- Confirm
dbt_project.ymlexists at the working directory root. If not, ask whether to rundataproduct-bootstrapfirst, then stop. - Confirm
uv run --quiet dbt --versionsucceeds from the project root. If it fails, runuv syncand retry; if still missing, stop and tell the user to add the dbt adapter for their warehouse topyproject.toml's[dependency-groups].dev(e.g.dbt-snowflake,dbt-databricks,dbt-bigquery,dbt-postgres) and re-runuv sync. Useuv run dbt …for every dbt CLI invocation in this skill. - Confirm
uv run --quiet entropy-data --versionsucceeds from the project root. If it fails, runuv syncand retry. Once available, runuv run entropy-data connection test. If that fails, stop and tell the user to runuv run entropy-data connection add <name> --host <host> --api-key <key>. Useuv run entropy-data …for every CLI invocation in this skill.
Step 1 — Resolve the data product
Accept either:
- a full URL (e.g.
https://app.entropy-data.com/dataproducts/<id>) — extract the trailing id, or - a bare data product id.
Run entropy-data dataproducts get <id> -o yaml. Remember the response as DATA_PRODUCT. Extract:
DATA_PRODUCT_ID,DATA_PRODUCT_NAME, owning team, purpose- the list of output ports — each has an id, a server (catalog/schema/table), and a linked data contract id
If the data product has more than one output port, ask the user which one(s) to implement in this run. Default to all.
If the data product does not exist in Entropy Data, ask the user if they want to create a new one.
Step 2 — Fetch the data contracts
For each selected output port, run entropy-data datacontracts get <contract-id> -o yaml with the contract id from the data product. Remember the response as CONTRACT, and write it to models/output_ports/v<N>/<contract-id>.odcs.yaml (the version directory matches the output port's version — default v1 if the data product does not declare one). If the file already exists and differs from the fetched contract, surface the diff and ask before overwriting.
The fields you need from CONTRACT:
models(table name → list of fields withtype,required,unique,description,classification)servers(so the output port's server config is consistent with the contract)termsandqualityrules — useful context but not required to materialize the model
Step 2.5 — Validate the contract against the target platform
Scan the contract for convention bugs and offer to fix them in one pass, with the patched contract published back to Entropy Data. Dispatched off servers[].type. Server types not listed below are skipped silently — add a section when extending support.
Snowflake — for every property in every schema covered by a type: snowflake server:
- Mixed-case
name. Snowflake folds unquoted identifiers to UPPERCASE;datacontract-cli(≥ 0.11.5) quotesnameverbatim. Any lowercase letter innamemakes the soda query miss the stored UPPERCASE column. Normalizenameto UPPERCASE. - Redundant
physicalName. WhenphysicalNameequals the UPPERCASE form ofname, drop it.
If nothing is flagged, continue silently to Step 3. Otherwise list every fix (one bullet per property × issue) and ask:
Found N convention issue(s) for
<server-type>on contract<CONTRACT_ID>:
- property
<old-name>: renamename→<NEW-NAME>- property
<NEW-NAME>: drop redundantphysicalName: <value>Apply, save to
models/output_ports/v<N>/<contract-file>.odcs.yaml, and publish back to Entropy Data? [Y/n]
On Y: patch the local file with yq -i, keep version unchanged (convention fix, not a schema change), entropy-data datacontracts put <CONTRACT_ID> --file <path>, re-read CONTRACT, continue. Stop on non-2xx.
On n: warn that datacontract test will fail on the un-normalized properties and continue. Don't re-ask this run.
Step 3 — Translate ODCS schema to dbt artifacts
Output column identifier rule (applies to this step and Step 4). Use the contract property's name directly as the SQL alias and the _models.yml columns: - name: entry. Don't substitute physicalName — datacontract test queries by name. Per-warehouse case conventions are enforced by Step 2.5; this step trusts the post-validation name.
For each contract:
Decide a dbt-side table name. Default: the
modelskey in the contract. Confirm with the user if it differs from the output-port server's table name.Identify candidate input ports. Run
entropy-data access list --consumer-dataproduct <DATA_PRODUCT_ID> -o jsonto list the access agreements where this product is the consumer. Each entry'sprovider.dataProductId/provider.outputPortIdis an input port this product can read. Keep only agreements withinfo.active: true(statusapproved); ignorepending/rejected. Only fall back to a broaderentropy-data search queryif the user explicitly asks. Ifmodels/input_ports/<provider-output-port-id>.source.yamlalready exists for an agreement, treat it as authoritative and skip recreating it.Generate
models/output_ports/v1/<table>.sql— a stubselectthat lists the contract columns explicitly withcast(... as <warehouse-type>) as <column>. Leave thefromclause as a TODO with a comment listing the candidate input ports from the previous step; do not invent business logic. Prepend a one-line header comment so a reader of the file knows which contract governs the schema:-- Governed by <contract-file>.odcs.yaml (ODCS id: <CONTRACT_ID>)The contract file sits in the same directory as the SQL, so the comment names the file without a path prefix.
Append the column list to
models/output_ports/v1/_models.ymlundermodels:— name, description (from contract), and tests derived from the contract:not_nullforrequired: true,uniqueforunique: true,accepted_valuesif the contract defines an enum. Add aconfig.meta.data_contractblock on the model that points back to the contract — this is the machine-readable counterpart to the SQL header comment, sodbt list --select config.meta.data_contract.id:<id>, dbt-docs, and lineage tooling can discover the link:models: - name: <table> description: <from contract> config: meta: data_contract: id: <CONTRACT_ID> file: models/output_ports/v<N>/<contract-file>.odcs.yaml columns: - ...Map ODCS types to the warehouse dialect:
ODCS type |
Databricks | Snowflake | BigQuery | Postgres |
|---|---|---|---|---|
string/text |
string |
varchar |
string |
text |
integer/long |
bigint |
number |
int64 |
bigint |
decimal/numeric |
decimal(38,9) |
number(38,9) |
numeric |
numeric |
boolean |
boolean |
boolean |
bool |
boolean |
timestamp |
timestamp |
timestamp_ntz |
timestamp |
timestamp |
date |
date |
date |
date |
date |
Pick the dialect from the contract's servers[].type (or, if absent, ask).
Step 4 — Implement the model bodies
Ask the user: "Want me to wire the output-port models to the input ports, or leave the from clauses as TODOs?" Default to wiring them. If the user declines, skip this step and continue with the next one.
For each output port table:
Declare each candidate input port as a dbt source — one file per agreement, plus a trust-snapshot of the upstream contract. For every agreement from Step 3.2:
Fetch the provider data product (
entropy-data dataproducts get <provider-data-product-id> -o yaml) to resolve the output port'sserver(catalog/schema/table) and linked contract id.Fetch the contract (
entropy-data datacontracts get <provider-contract-id> -o yaml) for columns.Write the fetched contract to
models/input_ports/<provider-output-port-id>.odcs.yaml. This is a cached snapshot of what we trust upstream to produce — it letsgit logshow when upstream's schema or quality rules changed under us. Do not hand-edit this file; the next run of this skill will refresh it from the platform. If the file already exists and the upstream contract has changed, surface the diff (so the user sees the drift) and ask before overwriting.Write the dbt source file to
models/input_ports/<provider-output-port-id>.source.yaml:version: 2 sources: - name: <provider-data-product-id>_<provider-output-port-id> database: <output port server.catalog> schema: <output port server.schema> config: meta: data_contract: id: <provider-contract-id> file: models/input_ports/<provider-output-port-id>.odcs.yaml tables: - name: <table> # from the contract's `models:` key — one entry per table in the contract description: <from contract> columns: - name: <col> description: <from contract> data_type: <warehouse type from the type map in Step 3>
The
sources[].namecombines<provider-data-product-id>_<provider-output-port-id>so it stays unique across agreements (two agreements with the same provider data product but different output ports do not collide). Eachtables:entry comes from the provider contract'smodels:block — a contract can declare multiple tables, and each one becomes a row here. Themeta.data_contractreference goes on the source element (one contract per output port), not on each table, and points at the local snapshot written in sub-step 3.One pair of files (
*.odcs.yaml+*.source.yaml) per agreement. Do not merge multiple agreements into a single file — each access grant should be independently visible ingit logand easy to remove when revoked. If either file already exists for the same<provider-output-port-id>, surface the diff and ask before overwriting.Match input columns to output columns, in this order. Stop at the first signal that yields exactly one candidate.
- Same semantic concept — both columns declare a
type: semanticsentry inauthoritativeDefinitionswhose URL ends in the same path segment after normalization (lowercase, strip non-alphanumeric — so…/processedTimestampmatches…/processed_timestamp). Scheme, host, and org-id prefix differences don't disqualify. - Same name (case-insensitive).
- Token superset — tokenize both names on
_and case boundaries; the shorter side's tokens are all contained in the longer side's. Covers patterns like<X>_NAME⊃<x>,<DOMAIN>_<X>⊃<x>,<X>_TIMESTAMP⊃timestamp. Generic single-token output names (id,name,type,value,code,key) need a second signal — require (1) or a description echo (the upstream column's description names the output concept) before treating as a hit.
If exactly one upstream column matches, project
cast(<input_col> as <warehouse_type>) as <output_name>. If multiple match, writecast(null as <type>) as <output_name> -- TODO: candidates: <names>. If none match, writecast(null as <type>) as <output_name> -- TODO: source <description>.- Same semantic concept — both columns declare a
Write the SQL body.
- Single input source, columns match 1:1 → replace the TODO
fromwithfrom {{ source('<provider-data-product-id>_<provider-output-port-id>', '<table>') }}(the first arg matchessources[].name, the second matchestables[].name— i.e. the contract's model key — from the source file written in step 4.1) and project each output column withcast(<input_col> as <warehouse_type>) as <output_col>. - Multiple input sources → leave the join logic as an inline TODO listing each candidate
{{ source(...) }}reference and the join keys the user will need to confirm. Do not invent join predicates. - Derived / aggregated columns (sums, ratios, windows implied by the contract description but not present in any input) → leave as
null as <col>with a-- TODO: compute <description from contract>comment.
- Single input source, columns match 1:1 → replace the TODO
Compile to verify. Run
dbt parse(cheap, no warehouse roundtrip) to catch syntax errors and unknown source references. If it fails, fix the generated SQL before continuing. Do not rundbt run— that touches the warehouse and is the user's call.
Step 5 — Stamp the data product as builder-managed
Check DATA_PRODUCT.customProperties for an entry with property: "dataProductBuilder" and value: "https://github.com/entropy-data/dataproduct-builder-dbt". If it is already there, skip this step.
If missing, update the data product on Entropy Data so the platform records that it is managed by this builder. Do not rebuild the ODPS from a template — preserve every other field as fetched in Step 1.
- Save the fetched data product to a temp file as YAML:
entropy-data dataproducts get <DATA_PRODUCT_ID> -o yaml > /tmp/<DATA_PRODUCT_ID>.odps.yaml. - Append to the top-level
customPropertieslist (create the list if absent):customProperties: - property: "dataProductBuilder" value: "https://github.com/entropy-data/dataproduct-builder-dbt" - Push the patched file back:
entropy-data dataproducts put <DATA_PRODUCT_ID> --file /tmp/<DATA_PRODUCT_ID>.odps.yaml. - Delete the temp file.
Forks of this plugin should substitute their own builder URL.
Step 6 — Hand off to entropy-data-sync
Call the entropy-data-sync skill (in this same plugin) so any missing publishing artifacts get created (<id>.odps.yaml, openlineage.yml, .github/workflows/data-product.yml). Pass the parameters you already resolved in Step 1 so the user is not re-asked.
If <id>.odps.yaml already exists locally and disagrees with the fetched data product, do not overwrite — surface the diff and ask.
Step 7 — Final report
End with this two-part recap. Use the same Status enum the other skills use: created, updated, already present, deferred, skipped.
Part 1 — outcome table. One row per output port implemented.
| Artifact | Status | Details |
|---|---|---|
| Data product | already present | <DATA_PRODUCT_ID> — fetched from platform |
dataProductBuilder customProperty |
… | "added — pushed to Entropy Data" / "already present" |
Output-port data contract <CONTRACT_ID> |
… | written to models/output_ports/v<N>/<contract_id>.odcs.yaml |
Contract validation (<server-type>) |
… | "passed" / "normalized & republished: <property> × N" / "issues found, user declined fix" / "skipped (no rules for <server-type>)" |
| Input-port data contracts | … | models/input_ports/<provider-output-port-id>.odcs.yaml — <N> files written / refreshed (trust snapshots, one per active access agreement) / skipped |
| Input port sources | … | models/input_ports/<provider-output-port-id>.source.yaml — <N> files written (one per active access agreement) / skipped |
Model <table>.sql |
… | models/output_ports/v1/<table>.sql — "wired to <source>" / "join TODO" / "skipped per user" |
_models.yml entry for <table> |
… | tests derived from the contract |
dbt parse |
… | "passed" / "failed: |
entropy-data-sync handoff |
… | "ran" / "skipped" — see sync's own report for ODPS/OpenLineage/workflow rows |
Part 2 — next steps. Bullet list, include only what applies:
- For each model with a join or derived-column TODO, list the inputs and the missing logic — one bullet per
<table>.sql. - Run
dbt runanddbt testlocally to verify the generated models compile and pass the contract-derived tests. - Run the contract test on the output ports against your warehouse:
datacontract test models/output_ports/v<N>/<file>.odcs.yamlfor each contract. - Any deferred items from the sync skill's report.
If there is nothing in Part 2, write a single line: No further action required.
Constraints
- Contract is source of truth for schema, not logic. Generate column names, types, and tests from the contract. When wiring SQL bodies in Step 4, project and cast only — do not invent join predicates, aggregations, or column derivations. Anything not directly mappable from an input column stays a TODO.
- Don't overwrite existing dbt SQL files. If
models/output_ports/v1/<table>.sqlalready exists, surface the diff and ask before changing. - Idempotent: re-running the skill with the same data product id should be a no-op when contract and local files already agree.
- Do not commit or push — leave VCS state to the user.