Data as a Product

Treating data as a product works when you fuse continuous discovery habits (Teresa Torres: outcomes, opportunity solution trees, assumption testing) with a platform that enforces contracts, SLOs, and governance (catalogs, open table formats, policy-as-code, observability).

The result: faster iteration with lower risk, measurable consumer value, and fewer data swamp traps.

What “Data as a Product” Really Means

It’s not “put a pretty UI on a dataset.” It’s an operating model:

Clear outcome: The product moves (e.g., reduce underwriting cycle time by 20%)
Explicit customers: Personas + jobs-to-be-done.
Accountable owner: PM + TL owning roadmap, SLOs, cost.
Product spec: Schema, semantics, lineage, policies, interface, SLOs, versioning.
Usage analytics: Feedback loop to evolve the product continuously.

Layering in Teresa Torres’ Playbook

This book was instrumental in helping me frame the data as a product concept with a real-world, socio-technical implemtation via DataMesh. Teresa Torres’ Continuous Discovery Habits gave me the the cadence and artifacts to make this real:

Outcomes over Outputs

Move a metric, not a backlog. Pick North Star + supporting guardrail metrics → (e.g., time-to-insight, adoption, trust/quality, cost-to-serve).

Opportunity Solution Tree (OST)

Map the path from goal → customer opportunities → solutions → experiments:

– Opportunities come from interviews, shadowing analysts, instrumentation (failed queries, schema drift pain, lineage gaps).
– Solutions are candidate changes to the data product (new features, derived tables, semantic layer additions, documentation, policy changes).
– Experiments de-risk assumptions before heavy engineering.

Assumption testing:

Turn risky beliefs into tests:

* “Analysts will trust this model if we expose column-level lineage” → Ship a thin slice with lineage UI + usage tracking and interview users.
* “We can keep PII out of the share” → Policy-as-code + synthetic data run; verify masking works under real queries.

Continuous interviews

– Weekly/lightweight sessions with real consumers (BI authors, modelers, ops) + a rolling interview repo (clips, tags, quotes).

Technical Contract of a Data Product

Publish a machine-checkable contract alongside human-readable docs:

Interface:

– Schema → (names, types, nullability), semantics (business definitions), keys and SLAs (freshness, completeness, availability).
– Read methods: → SQL endpoint(s), governed share, API, event stream.
– Versioning → Semantic version for breaking/non-breaking changes; deprecation windows.
– Access policy → Row/column filters, masking, retention, export controls.

Quality & SLOs:

– SLOs → e.g., p95 query latency, freshness ≤ 15m, DQ score ≥ 99.5%, lineage coverage 100%.
– Error budgets → Tie outages/breaches to roadmap tradeoffs.

Lineage & docs:

– End-to-end lineage → Source → Transform → Product, auto-generated docs, examples, and “how to consume” recipes.

Observability:

– Freshness, volume, distribution drift, null spikes, primary-key duplication, join-ability checks, PII leakage detectors.

Reference Platform (What Good Looks Like)

Storage & Interop

– Open table format → (e.g., Iceberg) on object storage for ACID + time-travel + engine independence.
– Portable catalog → As system of record (table metadata, ownership, tags).

Governance plane

– Catalog + glossary for discovery; lineage graph; classifiers for PII/PCI; policy-as-code (RBAC/ABAC, masking, row filters, retention).
– Key management + tokenization; audit trails.

Sharing & access

– Zero-copy sharing across domains and clouds; dynamic filters for least-privilege consumption.
– Semantic layer (metrics definitions, dimensions, calcs) to stabilize BI/ML against schema churn.

Data contracts & CI/CD

– Contract files in repo (JSON/YAML) + CI gates that block breaking changes, PII regressions, or SLO backslides.
– Blue/green schema rollout with auto-deprecation notices.

Observability & Telemetry

– Pipeline/runtime monitors + in-product usage: who queries what, top errors, slowest joins, broken dashboards.
– Cost telemetry per product to drive cost-to-serve reductions.

Discovery → Delivery Loop (Cadence)

– Customer interviews (30–45 min). Update OST with new opportunities.
– Review usage dashboards (adoption, failures, time-to-first-value).
– Choose 1–2 assumption tests (thin slices, not epics).

Bi-weekly

– Ship thin slices behind feature flags (e.g., new dimension, lineage view, sample notebook).
– Run A/B or shadow comparisons on model changes.

Monthly

* SLO review + error budget burn; decide on pace vs. reliability.
* Cost review; identify expensive query patterns; push semantics or aggregates closer to the product.

Quarterly

– Outcome check (did we move the North Star?).
– Retire unused features (measured by telemetry), cut cost, simplify schema.

Roles & RACI (Lightweight, but Real)

Data Product Manager (DPM) → Outcomes, OST, contracts, adoption, pricing/cost-to-serve.
Tech Lead / Data Engineer → Pipeline design, performance, rollout, reliability.
Analytics/ML Partner → Consumer proxy, metric definitions, acceptance tests.
Platform Team → Paved road: catalogs, governance, contracts CI, observability.
Data Governance → Define policies; platform enforces them automatically.

Measuring Success (KPIs That Matter)

Time-to-first-value → For new consumers (median days).
Adoption → # active consumers, # dependent products/dashboards.
SLO attainment → Freshness, availability, DQ + **error budget burn**.
Change lead time → Schema change, production & docs updated.
Cost-to-serve per query → Unit economics (e.g., per 1k dashboard views).
Trust signals → NPS from analysts, compliance posture, lineage completeness.

Templates You Can Steal

Purpose & outcome → The business metric it moves.
Personas & jobs → Who uses it and why.
Contract → Schema link + SLOs + policies.
Getting started → Example queries/notebooks; BI semantic layer mappings.
Change log & roadmap → Versions, deprecations, upcoming bets (from OST).

product: abc_features
version: 1.6.0
owners:
  product_manager: jez@data-lingua.com
  tech_lead: alice@data-lingua.com
interface:
  access: delta_share://risk/underwriting_features
  schema_ref: schema/underwriting_features.avsc
  read_methods: [sql, python, dbt, spark]
  semantics_ref: docs/semantics.md
slo:
  freshness_minutes_p95: 15
  availability_rolling_30d: "99.9%"
  dq_score_min: 99.5
policies:
  pii_masking: true
  row_filters:
    region: consumer_region IN ('EU','UK','US')
versioning:
  deprecation_policy_days: 90
observability:
  checks: [freshness, volume, drift, pk_uniqueness, null_spikes]
 ...

product: abc_features
version: 1.6.0
owners:
  product_manager: jez@data-lingua.com
  tech_lead: alice@data-lingua.com
interface:
  access: delta_share://risk/underwriting_features
  schema_ref: schema/underwriting_features.avsc
  read_methods: [sql, python, dbt, spark]
  semantics_ref: docs/semantics.md
slo:
  freshness_minutes_p95: 15
  availability_rolling_30d: "99.9%"
  dq_score_min: 99.5
policies:
  pii_masking: true
  row_filters:
    region: consumer_region IN ('EU','UK','US')
versioning:
  deprecation_policy_days: 90
observability:
  checks: [freshness, volume, drift, pk_uniqueness, null_spikes]
 ...

Anti-Patterns (How This Fails)

Data product = Renamed table. No SLOs, no contract, no owner.
Big-bang modeling = Months of design, little discovery → misses real opportunities.
Manual governance boards = Humans approving every change → throughput dies; encode policies in the platform.
Copy-based sharing = CSV exports between clouds → drift, cost, security risk.
Unbounded schemas = Constant churn without semantic layer or versioning → consumer breakage.

Data as a Product

What “Data as a Product” Really Means

Layering in Teresa Torres’ Playbook

Technical Contract of a Data Product

Reference Platform (What Good Looks Like)

Discovery → Delivery Loop (Cadence)

Roles & RACI (Lightweight, but Real)

Measuring Success (KPIs That Matter)

Templates You Can Steal

Anti-Patterns (How This Fails)

Log-Structured Merge Trees

FinOps Maturity Curve for Data

Normal Forms

Anchor Modeling

Model Context Protocol

What “Data as a Product” Really Means

Layering in Teresa Torres’ Playbook

Technical Contract of a Data Product

Reference Platform (What Good Looks Like)

Discovery → Delivery Loop (Cadence)

Roles & RACI (Lightweight, but Real)

Measuring Success (KPIs That Matter)

Templates You Can Steal

Anti-Patterns (How This Fails)

Share this:

Related Posts

Trending now

Discover more from Data Lingua. Where Data Engineering Meets Agentic Business Strategy