Data as a Product

Treating data as a product works when you fuse continuous discovery habits (Teresa Torres: outcomes, opportunity solution trees, assumption testing) with a platform that enforces contracts, SLOs, and governance (catalogs, open table formats, policy-as-code, observability).

The result: faster iteration with lower risk, measurable consumer value, and fewer data swamp traps.

What “Data as a Product” Really Means

It’s not “put a pretty UI on a dataset.” It’s an operating model:

  • Clear outcome: The product moves (e.g., reduce underwriting cycle time by 20%)
  • Explicit customers: Personas + jobs-to-be-done.
  • Accountable owner: PM + TL owning roadmap, SLOs, cost.
  • Product spec: Schema, semantics, lineage, policies, interface, SLOs, versioning.
  • Usage analytics: Feedback loop to evolve the product continuously.

Layering in Teresa Torres’ Playbook

This book was instrumental in helping me frame the data as a product concept with a real-world, socio-technical implemtation via DataMesh. Teresa Torres’ Continuous Discovery Habits gave me the the cadence and artifacts to make this real:

Outcomes over Outputs

Move a metric, not a backlog. Pick North Star + supporting guardrail metrics (e.g., time-to-insight, adoption, trust/quality, cost-to-serve).

Opportunity Solution Tree (OST)

Map the path from goal customer opportunities solutions experiments:

– Opportunities come from interviews, shadowing analysts, instrumentation (failed queries, schema drift pain, lineage gaps).
– Solutions are candidate changes to the data product (new features, derived tables, semantic layer additions, documentation, policy changes).
– Experiments de-risk assumptions before heavy engineering.

Assumption testing:

Turn risky beliefs into tests:

* “Analysts will trust this model if we expose column-level lineage” Ship a thin slice with lineage UI + usage tracking and interview users.
* “We can keep PII out of the share” Policy-as-code + synthetic data run; verify masking works under real queries.

Continuous interviews

– Weekly/lightweight sessions with real consumers (BI authors, modelers, ops) + a rolling interview repo (clips, tags, quotes).

Technical Contract of a Data Product

Publish a machine-checkable contract alongside human-readable docs:

Interface:

– Schema (names, types, nullability), semantics (business definitions), keys and SLAs (freshness, completeness, availability).
– Read methods: SQL endpoint(s), governed share, API, event stream.
– Versioning Semantic version for breaking/non-breaking changes; deprecation windows.
– Access policy Row/column filters, masking, retention, export controls.

Quality & SLOs:

– SLOs e.g., p95 query latency, freshness ≤ 15m, DQ score ≥ 99.5%, lineage coverage 100%.
– Error budgets Tie outages/breaches to roadmap tradeoffs.

Lineage & docs:

– End-to-end lineage Source Transform Product, auto-generated docs, examples, and “how to consume” recipes.

Observability:

– Freshness, volume, distribution drift, null spikes, primary-key duplication, join-ability checks, PII leakage detectors.

Reference Platform (What Good Looks Like)

Storage & Interop

– Open table format (e.g., Iceberg) on object storage for ACID + time-travel + engine independence.
– Portable catalog As system of record (table metadata, ownership, tags).

Governance plane

– Catalog + glossary for discovery; lineage graph; classifiers for PII/PCI; policy-as-code (RBAC/ABAC, masking, row filters, retention).
– Key management + tokenization; audit trails.

Sharing & access

– Zero-copy sharing across domains and clouds; dynamic filters for least-privilege consumption.
– Semantic layer (metrics definitions, dimensions, calcs) to stabilize BI/ML against schema churn.

Data contracts & CI/CD

– Contract files in repo (JSON/YAML) + CI gates that block breaking changes, PII regressions, or SLO backslides.
– Blue/green schema rollout with auto-deprecation notices.

Observability & Telemetry

– Pipeline/runtime monitors + in-product usage: who queries what, top errors, slowest joins, broken dashboards.
– Cost telemetry per product to drive cost-to-serve reductions.

Discovery → Delivery Loop (Cadence)


– Customer interviews (30–45 min). Update OST with new opportunities.
– Review usage dashboards (adoption, failures, time-to-first-value).
– Choose 1–2 assumption tests (thin slices, not epics).

Bi-weekly

– Ship thin slices behind feature flags (e.g., new dimension, lineage view, sample notebook).
– Run A/B or shadow comparisons on model changes.

Monthly

* SLO review + error budget burn; decide on pace vs. reliability.
* Cost review; identify expensive query patterns; push semantics or aggregates closer to the product.

Quarterly

– Outcome check (did we move the North Star?).
– Retire unused features (measured by telemetry), cut cost, simplify schema.

Roles & RACI (Lightweight, but Real)

  • Data Product Manager (DPM) Outcomes, OST, contracts, adoption, pricing/cost-to-serve.
  • Tech Lead / Data Engineer Pipeline design, performance, rollout, reliability.
  • Analytics/ML Partner Consumer proxy, metric definitions, acceptance tests.
  • Platform Team Paved road: catalogs, governance, contracts CI, observability.
  • Data Governance Define policies; platform enforces them automatically.

Measuring Success (KPIs That Matter)

  • Time-to-first-value For new consumers (median days).
  • Adoption # active consumers, # dependent products/dashboards.
  • SLO attainment Freshness, availability, DQ + **error budget burn**.
  • Change lead time Schema change, production & docs updated.
  • Cost-to-serve per query Unit economics (e.g., per 1k dashboard views).
  • Trust signals NPS from analysts, compliance posture, lineage completeness.

Templates You Can Steal

  • Purpose & outcome The business metric it moves.
  • Personas & jobs Who uses it and why.
  • Contract Schema link + SLOs + policies.
  • Getting started Example queries/notebooks; BI semantic layer mappings.
  • Change log & roadmap Versions, deprecations, upcoming bets (from OST).

product: abc_features
version: 1.6.0
owners:
  product_manager: jez@data-lingua.com
  tech_lead: alice@data-lingua.com
interface:
  access: delta_share://risk/underwriting_features
  schema_ref: schema/underwriting_features.avsc
  read_methods: [sql, python, dbt, spark]
  semantics_ref: docs/semantics.md
slo:
  freshness_minutes_p95: 15
  availability_rolling_30d: "99.9%"
  dq_score_min: 99.5
policies:
  pii_masking: true
  row_filters:
    region: consumer_region IN ('EU','UK','US')
versioning:
  deprecation_policy_days: 90
observability:
  checks: [freshness, volume, drift, pk_uniqueness, null_spikes]
 ...

Anti-Patterns (How This Fails)

  • Data product = Renamed table. No SLOs, no contract, no owner.
  • Big-bang modeling = Months of design, little discovery misses real opportunities.
  • Manual governance boards = Humans approving every change throughput dies; encode policies in the platform.
  • Copy-based sharing = CSV exports between clouds drift, cost, security risk.
  • Unbounded schemas = Constant churn without semantic layer or versioning consumer breakage.

Discover more from Data Lingua. Where Data Engineering Meets Agentic Business Strategy

Subscribe now to keep reading and get access to the full archive.

Continue reading