Data Contracts

A data contract is the agreement between the team that produces a dataset and the people or systems that consume it. It describes the ownership, structure, semantics, quality, and terms of use of the data — so consumers know exactly what they are getting and can trust it.

Data contracts are the core of Entropy Data: every output port of a data product is specified by a data contract, which is what turns a raw endpoint into something a consumer can rely on.

For a conceptual introduction — why data contracts exist, the contract-first approach, and the problems they solve — see Entropy Data's explainer What is a data contract?.

What's in a data contract

A data contract captures five dimensions of an agreement:

  • Ownership — which team is responsible for the data and how to reach them.
  • Schema — the structure: models, fields, types, and relationships.
  • Semantics — what the data means, linked to your business definitions.
  • Quality — the rules the data is guaranteed to meet, and how they are tested.
  • Terms of use — pricing, SLAs, licensing, and the conditions for accessing the data.

The Open Data Contract Standard (ODCS)

Entropy Data uses the Open Data Contract Standard (ODCS) — a machine-readable YAML format — as the specification for its data contracts. ODCS is an open standard governed by Bitol, a Linux Foundation AI & Data project; it originated as the "Data Contract Template" used at PayPal. Entropy Data supports ODCS v3.

For an overview of the standard and how Entropy Data implements it, see Open Data Contract Standard. The full specification lives in the Bitol ODCS repository.

ODCS building blocks

An ODCS contract is organized into a few top-level sections:

  • Fundamentals — id, name, version, status, and other metadata.
  • Schema — the models and their fields, including complex (nested) structures.
  • Data quality — quality rules attached to the schema, runnable as tests.
  • SLAs — service-level expectations such as freshness and availability.
  • Servers — the physical locations of the data across environments.
  • Roles — access roles used for role-based access control.
  • Pricing and custom properties — optional cost and organization-specific fields.

Refer to datacontract.com and the ODCS specification for the full field-level reference.

Data quality

Data quality rules are the promises a contract makes about its data — the part that turns a schema into something consumers can trust. In ODCS, quality rules are attached to the schema, at the level of a model or an individual field.

Rules range from simple library checks (row counts, uniqueness, not-null) to custom SQL checks that express a business invariant. For example, the seeded Orders contract carries checks such as Row Count, No Duplicate Order IDs, and Previous Order Referential Integrity. The available rule types and their exact YAML are defined by ODCS — see the ODCS data quality reference and datacontract.com.

Quality rules are not just documentation: Entropy Data runs them against the real data behind an output port and records pass/fail results on the contract. See Test a data contract.

Data contracts in Entropy Data

Within Entropy Data, a data contract is a living artifact you create, edit, test, and version:

  • Specifies output ports — a contract is linked to a data product's output port and is what consumers read before they request access.
  • Authored in the Data Contract Editor — edit a contract through its diagram, form, and YAML views in the browser-based Data Contract Editor (the default for ODCS contracts), with live validation. See Edit a data contract.
  • Created from a source — derive the schema from an existing asset (or import from Excel) instead of typing it by hand, then refine it in the editor. See Put data under contract.
  • Tested against real data — run the contract's data quality rules in the UI or with the Data Contract CLI, locally or in CI/CD. See Test a data contract.
  • Versioned — manage breaking changes with major versions. See Version data contracts.
  • Synced with Git — keep contracts in your repository and auto-push changes, or edit them as code.
  • Rich descriptions — description fields support Markdown and HTML. See Rich Text Formatting.

Looking for a worked example of the YAML? See Data Contract Examples.

Managing data contracts via API

To create, read, update, and test data contracts programmatically, use the REST API — see the Data Contracts API reference.

Learn more