Test ODCS data contracts against the live server
Run the Data Contract CLI (datacontract test) against contracts in the project to check whether the data currently produced by a warehouse still matches the schema and quality rules declared in the contract.
Two kinds of contracts live in this project and they test against different warehouses:
- Output-port contracts at
models/output_ports/v<N>/*.odcs.yaml— what this data product commits to produce. They test against this project's warehouse. A failure means we are no longer producing what we promised. - Input-port contracts at
models/input_ports/*.odcs.yaml— cached snapshots of what we trust upstream to produce. They test against the upstream provider's warehouse, using a server block from upstream's ODCS. A failure means upstream drifted from the contract we trusted; the consequence is that our output may break too. Treat input-port failures as an upstream incident, not a local bug.
When to use this vs. other skills
- You changed a contract and want to know if the edit breaks consumers → use
datacontract-edit(it edits, tests, and classifies the failure as breaking-or-not). - You want to verify existing contracts against current data, no edits → this skill.
- A CI run failed the contract test step → this skill, to reproduce locally with
--logs.
How to run this skill
${PLUGIN_ROOT}below refers to the root of this plugin — the directory that containsskills/. On Claude Code it is set automatically as${CLAUDE_PLUGIN_ROOT}— use that. On any other agent (Codex, Copilot CLI, etc.) it is unset; resolve it as../..relative to thisSKILL.mdfile's directory (i.e. the grandparent ofskills/<this-skill>/).
Plan announcement (before Step 0)
Before running Step 0, print this plan to the user verbatim:
Running datacontract-test. I'll:
- Pre-checks: confirm the
datacontractCLI is on PATH and the server credentials are available.- Pick which contract(s) to test — defaults to all
models/output_ports/**/*.odcs.yamlandmodels/input_ports/*.odcs.yaml.- Pick the server (defaults to
productionif the contract has one).- Run
datacontract testper contract and capture the result.- Report pass/fail with per-rule detail; flag missing credentials separately from real failures.
Then proceed.
Step 0 — Pre-checks
- Confirm
uv run --quiet datacontract --versionsucceeds from the project root. If it fails, runuv sync(the bootstrap template seedsdatacontract-cli[all]as a dev dep inpyproject.toml) and retry. Ifuv syncstill doesn't make it available, stop and tell the user to verifydatacontract-cli[all]is listed inpyproject.toml's[dependency-groups].dev. Do not proposeuv tool installhere — per-project venv is the convention. - Confirm at least one
*.odcs.yamlexists undermodels/output_ports/**/ormodels/input_ports/. If not, stop and tell the user there's nothing to test. - For each contract that will run, inspect its
serversblock and list the env vars the chosen server type needs (e.g.DATACONTRACT_SNOWFLAKE_USERNAME/..._PASSWORD,DATACONTRACT_DATABRICKS_TOKEN,DATACONTRACT_BIGQUERY_ACCOUNT_INFO_JSON). If any are unset, surface the list to the user and ask whether to continue (the CLI will fail-fast on that server) or stop. Do not try to source credentials yourself.
Step 1 — Select contracts
- If the user named a specific contract file or data product id, resolve it to one file. Search both
models/output_ports/**/*.odcs.yamlandmodels/input_ports/*.odcs.yaml. - If the user said "output contracts" / "input contracts" / "upstream drift", scope to one of those globs.
- If they didn't, default to all ODCS files under both globs. List them, grouped by Output ports and Input ports so the user sees the two roles, then ask before running.
- Remember the resolved list as
CONTRACTS. For each entry, also remember its role (outputorinput) — Step 4 surfaces failures differently.
Step 2 — Select the server
For each contract in CONTRACTS:
- If the contract has exactly one server, use it.
- If it has multiple, default to
production. Ifproductionisn't defined, ask the user which one. - Only pass
--server allif the user explicitly asks to test every server.
Step 3 — Run the test
For each contract:
uv run datacontract test <path-to-contract>.odcs.yaml --server <server> --logs
Where <path-to-contract> is the file resolved in Step 1 — typically models/output_ports/v<N>/<file>.odcs.yaml for output contracts, or models/input_ports/<file>.odcs.yaml for input contracts. The CLI does not care which directory; the role only matters for how Step 4 reports the result.
--logsensures per-rule failure detail is in stdout — without it the CLI only prints a summary.- If the user asks for a persisted report (e.g. to attach to a PR), add
--output ./test-results/<contract>.xml --output-format junit. - If the user asks to publish results back to Entropy Data (matches the generated CI workflow), add
--publish $API/test-resultswhere$APIis the Entropy Data host. Don't publish by default — it writes server-side state. - Capture stdout and exit code per contract. Non-zero exit means at least one rule failed.
Run sequentially, not in parallel — the warehouse is the bottleneck and parallel runs muddy the log output.
Step 4 — Report
End with this two-part recap. Use the shared Status enum (created, updated, already present, deferred, skipped); for this skill the relevant statuses are passed, failed, and skipped (missing creds).
Part 1 — outcome table. One row per contract tested. Group the rows: output-port contracts first, then input-port contracts under a sub-header (so the reader sees the two roles at a glance).
| Contract | Role | Server | Result | Failures | Details |
|---|---|---|---|---|---|
<contract-file> |
output / input |
<server> |
passed / failed / skipped |
count or — |
one line per failing rule (field + rule), or "missing env var: …" if skipped |
Part 2 — next steps. Bullet list, include only what applies. Treat output vs. input failures differently:
- Output-port failures: surface the field and the violated check (e.g.
orders.order_id: not_null violated for 17 rows). The fix is in this project — either the dbt model is wrong, the contract is wrong, or the data is wrong. If the user wants a follow-up SQL to find the offending rows, suggest the shape but don't run it. If failures look like they came from a contract edit (rules tightening), point atdatacontract-editto classify breaking-vs-additive. - Input-port failures: this is upstream drift. Name the provider data product and output port (from the contract id and file name). The fix is not in this project — the user should contact the upstream owner, and in the meantime expect downstream output-port failures. Suggest re-running
dataproduct-implementonce upstream republishes a corrected contract, so the cached snapshot undermodels/input_ports/refreshes. - For each
skippedrow, the exact env vars the user needs to set, and where to get them (usually the warehouse admin orentropy-data connection get). - If failures look like a data quality issue (rules unchanged, data drifted), suggest investigating the upstream of the failing model — this skill does not auto-fix data.
If everything passed, write a single line: All <N> contracts pass against <server>.
Authentication examples by server type
The Data Contract CLI reads credentials from environment variables, not from the contract file. Only the connection topology (host, database, schema, etc.) belongs in the servers block. The examples below cover the most common warehouses. Other types (Oracle, MySQL, Trino, DuckDB, Kafka, ...) follow the same pattern; see the Data Contract CLI README for the full list.
Snowflake
ODCS server block:
servers:
production:
type: snowflake
account: abcdefg-xn12345
database: ORDER_DB
schema: ORDERS_PII_V2
Any env var prefixed DATACONTRACT_SNOWFLAKE_ is forwarded to the Snowflake connector with the prefix stripped and the rest lowercased, so you can pass any Snowflake/Soda parameter this way. Three auth modes:
Password auth
export DATACONTRACT_SNOWFLAKE_USERNAME=...
export DATACONTRACT_SNOWFLAKE_PASSWORD=...
export DATACONTRACT_SNOWFLAKE_WAREHOUSE=COMPUTE_WH
export DATACONTRACT_SNOWFLAKE_ROLE=DATA_CONTRACT_TEST
Private key (JWT) auth — used for service accounts and CI:
export DATACONTRACT_SNOWFLAKE_USERNAME=SVC_DATACONTRACT
export DATACONTRACT_SNOWFLAKE_AUTHENTICATOR=SNOWFLAKE_JWT
export DATACONTRACT_SNOWFLAKE_PRIVATE_KEY_PATH=/secrets/snowflake_rsa.p8
# Only if the key is encrypted:
export DATACONTRACT_SNOWFLAKE_PRIVATE_KEY_PASSPHRASE=...
export DATACONTRACT_SNOWFLAKE_WAREHOUSE=COMPUTE_WH
export DATACONTRACT_SNOWFLAKE_ROLE=DATA_CONTRACT_TEST
External browser SSO — interactive, for local runs against an IdP-backed account:
export DATACONTRACT_SNOWFLAKE_USERNAME=jane.doe@example.com
export DATACONTRACT_SNOWFLAKE_AUTHENTICATOR=externalbrowser
export DATACONTRACT_SNOWFLAKE_WAREHOUSE=COMPUTE_WH
export DATACONTRACT_SNOWFLAKE_ROLE=DATA_CONTRACT_TEST
Not usable in CI — it opens a browser window.
Databricks
ODCS server block:
servers:
production:
type: databricks
host: adb-1234567890.7.azuredatabricks.net # optional, can also come from env
catalog: acme_catalog_prod
schema: orders_latest
The datacontract CLI does not share auth state with the databricks CLI — a token must be supplied explicitly via DATACONTRACT_DATABRICKS_TOKEN. When surfacing missing credentials to the user, recommend the OAuth-first path; fall back to PAT only when OAuth isn't available.
Recommended — short-lived OAuth from the already-authenticated databricks CLI:
export DATACONTRACT_DATABRICKS_TOKEN=$(databricks auth token | jq -r .access_token)
export DATACONTRACT_DATABRICKS_HTTP_PATH=/sql/1.0/warehouses/<warehouse-id>
Token is valid ~1h, the literal value never lands in shell history, and a leaked token expires before most attackers notice — much smaller blast radius than a long-lived PAT.
Fallback — Personal Access Token (use when databricks auth token isn't available: PAT-only profile, OAuth refresh issue, headless shell):
export DATACONTRACT_DATABRICKS_TOKEN=dapi...
export DATACONTRACT_DATABRICKS_HTTP_PATH=/sql/1.0/warehouses/<warehouse-id>
A PAT is long-lived until rotated. Scope it narrowly (read access to the data product's schema is enough) and avoid putting the export in .bashrc/.zshrc — it persists in shell history.
CI — use a service-principal-issued token (M2M OAuth, or an SP-owned PAT), not a personal one, with SELECT scoped to the data product's schema. Set as a repository secret named DATACONTRACT_DATABRICKS_TOKEN.
Optional env vars:
export DATACONTRACT_DATABRICKS_SERVER_HOSTNAME=adb-... # only needed if `host` is not in the server block
Postgres
ODCS server block:
servers:
production:
type: postgres
host: db.example.internal
port: 5432
database: analytics
schema: public
Env vars:
export DATACONTRACT_POSTGRES_USERNAME=datacontract_ro
export DATACONTRACT_POSTGRES_PASSWORD=...
Both are required. Use a read-only role.
Amazon Athena
ODCS server block:
servers:
production:
type: athena
catalog: awsdatacatalog # optional, default is awsdatacatalog
schema: orders_db
regionName: eu-central-1
stagingDir: s3://acme-athena-results/datacontract/
Env vars:
export DATACONTRACT_S3_ACCESS_KEY_ID=AKIA... # required
export DATACONTRACT_S3_SECRET_ACCESS_KEY=... # required
export DATACONTRACT_S3_REGION=eu-central-1 # optional, overrides regionName
export DATACONTRACT_S3_SESSION_TOKEN=... # optional, for STS temporary creds
The IAM principal needs athena:* on the workgroup, glue:Get* on the catalog, and read/write on the stagingDir bucket prefix.
BigQuery
ODCS server block:
servers:
production:
type: bigquery
project: acme-data-prod
dataset: orders
Two auth modes:
Service account key file
export DATACONTRACT_BIGQUERY_ACCOUNT_INFO_JSON_PATH=/secrets/bq-sa.json
Application Default Credentials (ADC) — no env vars needed. Used automatically when DATACONTRACT_BIGQUERY_ACCOUNT_INFO_JSON_PATH is unset. Works with gcloud auth application-default login for local runs and with Workload Identity Federation in CI.
Optional impersonation:
export DATACONTRACT_BIGQUERY_IMPERSONATION_ACCOUNT=datacontract@acme-data-prod.iam.gserviceaccount.com
The principal needs bigquery.dataViewer on the dataset and bigquery.jobUser on the project.
Microsoft Fabric (SQL Server protocol)
Fabric Warehouse and Lakehouse SQL endpoints speak the SQL Server wire protocol, so use type: sqlserver.
ODCS server block:
servers:
production:
type: sqlserver
host: abc123def.datawarehouse.fabric.microsoft.com
port: 1433
database: orders_wh
schema: dbo
driver: ODBC Driver 18 for SQL Server
Fabric only accepts Entra ID (Azure AD) auth, not SQL logins. Pick one of:
Service principal — for CI:
export DATACONTRACT_SQLSERVER_AUTHENTICATION=ActiveDirectoryServicePrincipal
export DATACONTRACT_SQLSERVER_CLIENT_ID=<app-registration-client-id>
export DATACONTRACT_SQLSERVER_CLIENT_SECRET=<client-secret>
User password — Entra ID username + password (no MFA):
export DATACONTRACT_SQLSERVER_AUTHENTICATION=ActiveDirectoryPassword
export DATACONTRACT_SQLSERVER_USERNAME=jane.doe@acme.com
export DATACONTRACT_SQLSERVER_PASSWORD=...
Interactive — opens a browser, for local dev only:
export DATACONTRACT_SQLSERVER_AUTHENTICATION=ActiveDirectoryInteractive
export DATACONTRACT_SQLSERVER_USERNAME=jane.doe@acme.com
The same env vars work for a regular on-prem SQL Server; switch DATACONTRACT_SQLSERVER_AUTHENTICATION=sql and supply DATACONTRACT_SQLSERVER_USERNAME / DATACONTRACT_SQLSERVER_PASSWORD.
Install ODBC Driver 18 locally (brew install msodbcsql18 on macOS, apt-get install msodbcsql18 on Debian/Ubuntu) before running.
Constraints
- Read-only against the warehouse. This skill runs
datacontract testwhich executesSELECTqueries; it never writes. Do not invokedatacontract publish,datacontract export, orentropy-data datacontracts putfrom this skill. - No edits to contracts or models. If a test fails, surface it — do not auto-patch the contract to make it pass. That defeats the purpose.
- No credential sourcing. If env vars are missing, tell the user; don't read them from
.env,~/.aws, or anywhere else on the user's behalf. - Idempotent: re-running the skill produces the same report against the same data. Failures from rules that depend on time (freshness, row-count windows) are expected to drift — note that in the failure detail when relevant.