> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/build/cookbook/downstream-data-droduct.md).

# Reuse Data Products

When one Data Product publishes trusted output tables, another Data Product can read from them directly instead of rebuilding the same logic. The upstream product owns the source-aligned facts or dimensions. The downstream product declares those outputs as read-only inputs, transforms them, and publishes its own models, semantic layer, and metrics.

This recipe walks through two example projects:

| Project                         | Role                | What it does                                                               |
| ------------------------------- | ------------------- | -------------------------------------------------------------------------- |
| `examples/customer_orders_dp`   | Upstream producer   | Publishes clean customer and order tables.                                 |
| `examples/customer_insights_dp` | Downstream consumer | Reads the upstream tables and builds customer revenue and health insights. |

***

## How it works

Apply the upstream Data Product first. It materializes tables in its own schema:

```
customer_orders.customer_profiles
customer_orders.order_facts
```

The downstream Data Product then declares those tables in its input contract and references them in SQL models:

```
customer_orders_dp
  -> customer_orders.customer_profiles
  -> customer_orders.order_facts
      -> customer_insights_dp
          -> customer_insights.customer_revenue_summary
          -> customer_insights.customer_health_segments
          -> customer_lifetime_value metric
```

`customer-orders` owns the customer and order contract. `customer-insights` owns the downstream analytics built on top of it. Each product controls its own outputs and is applied independently.

***

## Step 1: Build the upstream Data Product

The upstream project is `examples/customer_orders_dp`. Its `config.yaml` declares it as a source-aligned product:

```yaml
name: customer-orders
displayName: Customer Orders Data Product
description: >
  Upstream data product providing clean customer profiles and order facts.
version: 0.1.0
alignment: sourceAligned
domain: commerce

modelDefaults:
  dialect: postgres
  cron: '@daily'
```

It publishes two output tables:

| Output table                        | Purpose                                                                                |
| ----------------------------------- | -------------------------------------------------------------------------------------- |
| `customer_orders.customer_profiles` | Customer-level dimension with plan, tier, status, signup date, and PII fields.         |
| `customer_orders.order_facts`       | Order-level fact table with gross amount, discounts, refunds, status, and net revenue. |

These output tables are the contract that downstream products depend on. `customer_orders.order_facts` exposes `customer_id`, `order_date`, `status`, and `net_revenue` as stable columns that downstream products can safely join and aggregate.

Apply this product first before working on the downstream project:

```bash
cd examples/customer_orders_dp
vulcan plan
vulcan apply
```

***

## Step 2: Set up the downstream Data Product

The downstream project is `examples/customer_insights_dp`. Its `config.yaml` marks it as a consumer-aligned product:

```yaml
name: customer-insights
displayName: Customer Insights Data Product
description: >
  Downstream analytics data product that reads from the customer-orders data product.
version: 0.1.0
alignment: consumerAligned
domain: commerce

modelDefaults:
  dialect: postgres
  cron: '@daily'
```

This product does not recreate customer or order logic. It reads the upstream output tables and layers customer success analytics on top.

***

## Step 3: Declare the upstream tables

Before writing any SQL, the downstream project needs to tell Vulcan which upstream tables it reads. Declare them in `inputs.yaml`:

```yaml
- name: '"warehouse"."customer_orders"."customer_profiles"'
  columns:
    customer_id: INT
    email: TEXT
    full_name: TEXT
    country: TEXT
    plan_type: TEXT
    signup_date: DATE
    status: TEXT
    customer_tier: TEXT
    days_since_signup: INT

- name: '"warehouse"."customer_orders"."order_facts"'
  columns:
    order_id: INT
    customer_id: INT
    order_date: DATE
    gross_amount: DECIMAL(12, 2)
    discount_amount: DECIMAL(12, 2)
    refund_amount: DECIMAL(12, 2)
    status: TEXT
    net_revenue: DECIMAL(12, 2)
    is_refunded: BOOLEAN
```

`inputs.yaml` is the read-only contract between the two products. Vulcan uses it to validate downstream SQL at plan time and to record cross-product lineage. The downstream product reads these tables but does not own them.

Use the fully qualified table name that matches the warehouse, schema, and table produced by the upstream product.

***

## Step 4: Build downstream models

With the upstream tables declared, write the downstream models. The first model joins the upstream customer and order tables into a revenue summary:

```sql
MODEL (
  name customer_insights.customer_revenue_summary,
  kind FULL,
  grains [customer_id]
);

SELECT
  cp.customer_id,
  cp.plan_type,
  cp.country,
  cp.customer_tier,
  cp.status,
  COUNT(o.order_id) AS total_orders,
  COALESCE(SUM(o.net_revenue) FILTER (WHERE o.status = 'completed'), 0) AS total_lifetime_value,
  MIN(o.order_date) AS first_order_date,
  MAX(o.order_date) AS last_order_date
FROM customer_orders.customer_profiles AS cp
LEFT JOIN customer_orders.order_facts AS o
  ON cp.customer_id = o.customer_id
GROUP BY
  cp.customer_id,
  cp.plan_type,
  cp.country,
  cp.customer_tier,
  cp.status;
```

The second model reads from the first and classifies each customer into a health segment:

```sql
MODEL (
  name customer_insights.customer_health_segments,
  kind FULL,
  grains [customer_id]
);

SELECT
  customer_id,
  plan_type,
  country,
  total_lifetime_value,
  CASE
    WHEN status = 'churned' THEN 'churned'
    WHEN total_orders = 0 THEN 'new'
    WHEN total_lifetime_value >= 1000
     AND (CURRENT_DATE - last_order_date) <= 90 THEN 'champion'
    WHEN (CURRENT_DATE - last_order_date) > 90 THEN 'at_risk'
    ELSE 'growing'
  END AS health_segment
FROM customer_insights.customer_revenue_summary;
```

The downstream DAG now carries two kinds of lineage: external lineage back to `customer_orders_dp`, and internal lineage from `customer_revenue_summary` to `customer_health_segments`.

***

## Step 5: Publish semantics and metrics

With the physical models in place, expose them through the semantic layer. This is what separates reusing a Data Product from simply reading a table: the downstream product gets its own governed query surface, independent of the upstream one.

Define a semantic model over the revenue summary:

```yaml
kind: semantic
name: customer_revenue_summary
depends_on: customer_insights.customer_revenue_summary

measures:
  - name: sum_lifetime_value
    type: sum
    expression: "{customer_revenue_summary.total_lifetime_value}"

segments:
  - name: paying_customers
    expression: "{customer_revenue_summary.customer_tier} = 'paying'"
```

Then publish a business metric on top of it:

```yaml
kind: metric
name: customer_lifetime_value
measure: customer_revenue_summary.sum_lifetime_value
ts: customer_revenue_summary.first_order_date
granularity: month
```

Consumers of `customer-insights` can now query `customer_lifetime_value` without needing to know anything about the upstream `customer-orders` product.

***

## Run order

Run the upstream product first, then the downstream product:

```bash
cd examples/customer_orders_dp
vulcan plan
vulcan apply

cd ../customer_insights_dp
vulcan plan
vulcan apply
```

If the upstream contract changes, update `inputs.yaml` in the downstream product before running the next plan.

***

## Checklist

Before building one Data Product on top of another, confirm:

* The upstream product has been applied successfully.
* The upstream output tables are stable, documented, and owned by the upstream team.
* The downstream product declares every upstream table it reads in `inputs.yaml`.
* Table names in SQL match the declared upstream names exactly.
* The downstream product has its own `stateSchema` so its state is isolated from the upstream product.
* Semantic models and metrics are defined in the downstream product for its specific use case.

{% hint style="info" %}
For governed reuse, always go through `inputs.yaml`. Do not copy upstream SQL into the downstream project unless ownership of that logic is intentionally moving.
{% endhint %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/build/cookbook/downstream-data-droduct.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
