Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

Treating data as a product works when you fuse continuous discovery habits (Teresa Torres: outcomes, opportunity solution trees, assumption testing) with a platform that enforces contracts, SLOs, and governance (catalogs, open table formats, policy-as-code, observability).
The result: faster iteration with lower risk, measurable consumer value, and fewer data swamp traps.
It’s not “put a pretty UI on a dataset.” It’s an operating model:
This book was instrumental in helping me frame the data as a product concept with a real-world, socio-technical implemtation via DataMesh. Teresa Torres’ Continuous Discovery Habits gave me the the cadence and artifacts to make this real:
Outcomes over Outputs
Move a metric, not a backlog. Pick North Star + supporting guardrail metrics → (e.g., time-to-insight, adoption, trust/quality, cost-to-serve).
Opportunity Solution Tree (OST)
Map the path from goal → customer opportunities → solutions → experiments:
– Opportunities come from interviews, shadowing analysts, instrumentation (failed queries, schema drift pain, lineage gaps).
– Solutions are candidate changes to the data product (new features, derived tables, semantic layer additions, documentation, policy changes).
– Experiments de-risk assumptions before heavy engineering.
Assumption testing:
Turn risky beliefs into tests:
* “Analysts will trust this model if we expose column-level lineage” → Ship a thin slice with lineage UI + usage tracking and interview users.
* “We can keep PII out of the share” → Policy-as-code + synthetic data run; verify masking works under real queries.
Continuous interviews
– Weekly/lightweight sessions with real consumers (BI authors, modelers, ops) + a rolling interview repo (clips, tags, quotes).
Publish a machine-checkable contract alongside human-readable docs:
Interface:
– Schema → (names, types, nullability), semantics (business definitions), keys and SLAs (freshness, completeness, availability).
– Read methods: → SQL endpoint(s), governed share, API, event stream.
– Versioning → Semantic version for breaking/non-breaking changes; deprecation windows.
– Access policy → Row/column filters, masking, retention, export controls.
Quality & SLOs:
– SLOs → e.g., p95 query latency, freshness ≤ 15m, DQ score ≥ 99.5%, lineage coverage 100%.
– Error budgets → Tie outages/breaches to roadmap tradeoffs.
Lineage & docs:
– End-to-end lineage → Source → Transform → Product, auto-generated docs, examples, and “how to consume” recipes.
Observability:
– Freshness, volume, distribution drift, null spikes, primary-key duplication, join-ability checks, PII leakage detectors.
Storage & Interop
– Open table format → (e.g., Iceberg) on object storage for ACID + time-travel + engine independence.
– Portable catalog → As system of record (table metadata, ownership, tags).
Governance plane
– Catalog + glossary for discovery; lineage graph; classifiers for PII/PCI; policy-as-code (RBAC/ABAC, masking, row filters, retention).
– Key management + tokenization; audit trails.
Sharing & access
– Zero-copy sharing across domains and clouds; dynamic filters for least-privilege consumption.
– Semantic layer (metrics definitions, dimensions, calcs) to stabilize BI/ML against schema churn.
Data contracts & CI/CD
– Contract files in repo (JSON/YAML) + CI gates that block breaking changes, PII regressions, or SLO backslides.
– Blue/green schema rollout with auto-deprecation notices.
Observability & Telemetry
– Pipeline/runtime monitors + in-product usage: who queries what, top errors, slowest joins, broken dashboards.
– Cost telemetry per product to drive cost-to-serve reductions.
– Customer interviews (30–45 min). Update OST with new opportunities.
– Review usage dashboards (adoption, failures, time-to-first-value).
– Choose 1–2 assumption tests (thin slices, not epics).
Bi-weekly
– Ship thin slices behind feature flags (e.g., new dimension, lineage view, sample notebook).
– Run A/B or shadow comparisons on model changes.
Monthly
* SLO review + error budget burn; decide on pace vs. reliability.
* Cost review; identify expensive query patterns; push semantics or aggregates closer to the product.
Quarterly
– Outcome check (did we move the North Star?).
– Retire unused features (measured by telemetry), cut cost, simplify schema.
product: abc_features
version: 1.6.0
owners:
product_manager: jez@data-lingua.com
tech_lead: alice@data-lingua.com
interface:
access: delta_share://risk/underwriting_features
schema_ref: schema/underwriting_features.avsc
read_methods: [sql, python, dbt, spark]
semantics_ref: docs/semantics.md
slo:
freshness_minutes_p95: 15
availability_rolling_30d: "99.9%"
dq_score_min: 99.5
policies:
pii_masking: true
row_filters:
region: consumer_region IN ('EU','UK','US')
versioning:
deprecation_policy_days: 90
observability:
checks: [freshness, volume, drift, pk_uniqueness, null_spikes]
...