> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/build/stage-2-productize/define-models-and-logic/data-models/sql-models.md).

# SQL models

{% hint style="warning" %}
**Supported model kinds:** `VIEW`, `FULL`, `INCREMENTAL_BY_TIME_RANGE`, `INCREMENTAL_BY_UNIQUE_KEY`, `INCREMENTAL_BY_PARTITION`, `SEED`, `EMBEDDED`, `MANAGED`

SQL models support all available model kinds.
{% endhint %}

SQL models are the primary way to define transformations in Vulcan. Each `.sql` file in `models/` becomes a table in your warehouse. Vulcan manages versioning, dependency resolution, and safe deployment automatically.

***

## Model structure

Every SQL model file has two parts:

1. **`MODEL` block (DDL)**: declares the model's name, kind, schedule, and metadata.
2. **`SELECT` query (DML)**: the transformation logic.

The two parts are separated by a semicolon at the end of the `MODEL` block.

***

## Bronze layer example

Bronze models sit closest to the source. They select from external source tables, declare the schema, and attach data integrity assertions.

`models/bronze/orders.sql` from `orders-analytics`:

```sql
MODEL (
  name bronze.orders,
  kind FULL,
  cron '*/15 * * * *',
  owner 'johndoetmdcio',
  grains [order_id],
  tags ('generated-data', 'bronze', 'orders', 'sales'),
  description 'Generated order transactions with customer, warehouse, status, and timestamp.',
  columns (
    order_id INTEGER,
    customer_id INTEGER,
    order_date TIMESTAMP,
    warehouse_id INTEGER,
    order_status TEXT
  ),
  assertions (
    unique_values(columns := (order_id)),
    not_null(columns := (order_id, customer_id, order_date, warehouse_id, order_status)),
    accepted_values(column := order_status, is_in := ('Confirmed', 'Shipped', 'Cancelled')),
    order_status_lookup_consistency()
  )
);

SELECT
  order_id,
  customer_id,
  order_date,
  warehouse_id,
  order_status
FROM public.orders_ext;
```

What this model does:

* Reads from the external source table `public.orders_ext` (declared in `external_models.yaml`).
* Materializes a `FULL` table that rebuilds every 15 minutes.
* Declares the grain as `order_id` (each row must be uniquely identified by order).
* Runs four assertions on every materialization: uniqueness, null checks, allowed status values, and a custom assertion that verifies each status exists in the lookup seed.

***

## Silver layer example

Silver models apply business logic. They join multiple bronze tables, exclude cancelled orders from revenue figures using a macro, and compute derived metrics.

`models/silver/fct_daily_sales.sql` from `orders-analytics`:

{% code overflow="wrap" %}

```sql
MODEL (
  name silver.fct_daily_sales,
  kind FULL,
  cron '*/15 * * * *',
  owner 'johndoetmdcio',
  grains [order_date, region_id, customer_id, product_id],
  description 'Daily sales fact table by order date, customer, product, and region.',
  tags ('silver', 'fact', 'daily-sales', 'revenue'),
  terms ('sales.daily_metrics', 'revenue.analytics'),
  columns (
    order_date TIMESTAMP,
    region_id INTEGER,
    region_name TEXT,
    customer_id INTEGER,
    product_id INTEGER,
    category TEXT,
    total_orders INTEGER,
    total_items_sold INTEGER,
    total_revenue NUMERIC(15, 2),
    avg_order_value NUMERIC(15, 2),
    total_shipments INTEGER,
    shipment_rate NUMERIC(10, 4)
  ),
  column_descriptions (
    order_date = 'Order date at daily grain',
    region_id = 'Customer region identifier',
    total_orders = 'Distinct orders at this grain',
    total_revenue = 'Gross revenue from line items',
    avg_order_value = 'Revenue divided by order count',
    shipment_rate = 'Shipped orders divided by total orders'
  ),
  assertions (
    not_null(columns := (order_date, region_id, region_name, customer_id, product_id, category)),
    forall(criteria := (total_orders >= 0, total_items_sold >= 0, total_revenue >= 0, total_shipments >= 0)),
    daily_sales_metric_consistency()
  )
);

WITH order_metrics AS (
  SELECT
    o.order_date::DATE AS order_date,
    c.region_id,
    r.region_name,
    o.customer_id,
    oi.product_id,
    p.category,
    COUNT(DISTINCT o.order_id)::INTEGER AS total_orders,
    SUM(oi.quantity)::INTEGER AS total_items_sold,
    SUM(oi.quantity * oi.unit_price)::NUMERIC(15, 2) AS total_revenue
  FROM bronze.orders AS o
  INNER JOIN bronze.customers AS c ON o.customer_id = c.customer_id
  INNER JOIN bronze.regions AS r ON c.region_id = r.region_id
  INNER JOIN bronze.order_items AS oi ON o.order_id = oi.order_id
  INNER JOIN bronze.products AS p ON oi.product_id = p.product_id
  WHERE @revenue_order_filter(o.order_status)
  GROUP BY o.order_date::DATE, c.region_id, r.region_name, o.customer_id, oi.product_id, p.category
),
shipment_metrics AS (
  SELECT
    o.order_date::DATE AS order_date,
    c.region_id,
    o.customer_id,
    oi.product_id,
    COUNT(DISTINCT s.shipment_id)::INTEGER AS total_shipments
  FROM bronze.orders AS o
  INNER JOIN bronze.customers AS c ON o.customer_id = c.customer_id
  INNER JOIN bronze.order_items AS oi ON o.order_id = oi.order_id
  LEFT JOIN bronze.shipments AS s ON o.order_id = s.order_id
  WHERE @revenue_order_filter(o.order_status)
  GROUP BY o.order_date::DATE, c.region_id, o.customer_id, oi.product_id
)
SELECT
  CAST(om.order_date AS TIMESTAMP) AS order_date,
  om.region_id,
  om.region_name,
  om.customer_id,
  om.product_id,
  om.category,
  om.total_orders,
  om.total_items_sold,
  ROUND(om.total_revenue, 2) AS total_revenue,
  @safe_ratio(om.total_revenue, om.total_orders, 2) AS avg_order_value,
  COALESCE(sm.total_shipments, 0) AS total_shipments,
  @safe_ratio(COALESCE(sm.total_shipments, 0), om.total_orders, 4) AS shipment_rate
FROM order_metrics AS om
LEFT JOIN shipment_metrics AS sm
  ON om.order_date = sm.order_date
  AND om.region_id = sm.region_id
  AND om.customer_id = sm.customer_id
  AND om.product_id = sm.product_id;
```

{% endcode %}

What this model does:

* Reads from five bronze tables and computes order metrics and shipment metrics in separate CTEs.
* Uses `@revenue_order_filter(o.order_status)` to exclude Cancelled orders from all revenue calculations. This macro expands to `order_status <> 'Cancelled'`.
* Uses `@safe_ratio(numerator, denominator, scale)` to compute ratios without dividing by zero.
* Declares a composite grain: one row per `order_date + region_id + customer_id + product_id`.
* Runs three assertions: not-null checks, a `forall` check that all numeric measures are non-negative, and a custom assertion checking metric consistency.

***

## Gold layer example

Gold models are analytics-ready. They consume silver models and produce outputs tailored to specific business use cases.

`models/gold/rfm_customer_segmentation.sql` from `orders-analytics`:

{% code overflow="wrap" %}

```sql
MODEL (
  name gold.rfm_customer_segmentation,
  kind FULL,
  cron '*/15 * * * *',
  owner 'johndoetmdcio',
  grains [customer_id],
  description 'RFM customer segmentation model for retention, lifecycle, and campaign prioritization.',
  tags ('gold', 'customer', 'rfm', 'segmentation'),
  assertions (
    unique_values(columns := (customer_id)),
    not_null(columns := (customer_id, customer_name, email, rfm_segment)),
    accepted_values(column := rfm_segment, is_in := (
      'Champions', 'Loyal Customers', 'Potential Loyalists', 'Recent Customers',
      'Promising', 'Customers Needing Attention', 'At Risk', 'Hibernating', 'Lost'
    )),
    rfm_score_consistency()
  )
);

WITH rfm_base AS (
  SELECT
    customer_id,
    customer_name,
    email,
    region_name,
    last_order_date,
    days_since_last_order AS recency_days,
    total_orders AS frequency_orders,
    total_revenue AS monetary_value,
    CASE
      WHEN days_since_last_order <= 30 THEN 5
      WHEN days_since_last_order <= 60 THEN 4
      WHEN days_since_last_order <= 90 THEN 3
      WHEN days_since_last_order <= 180 THEN 2
      ELSE 1
    END AS recency_score,
    CASE
      WHEN total_orders >= 4 THEN 5
      WHEN total_orders >= 3 THEN 4
      WHEN total_orders >= 2 THEN 3
      WHEN total_orders >= 1 THEN 2
      ELSE 1
    END AS frequency_score,
    CASE
      WHEN total_revenue >= 1000 THEN 5
      WHEN total_revenue >= 750 THEN 4
      WHEN total_revenue >= 500 THEN 3
      WHEN total_revenue > 0 THEN 2
      ELSE 1
    END AS monetary_score
  FROM silver.dim_customer_profile
)
SELECT
  customer_id,
  customer_name,
  email,
  region_name,
  last_order_date,
  recency_days,
  frequency_orders,
  monetary_value,
  recency_score,
  frequency_score,
  monetary_score,
  recency_score::TEXT || frequency_score::TEXT || monetary_score::TEXT AS rfm_score,
  CASE
    WHEN recency_score >= 4 AND frequency_score >= 4 AND monetary_score >= 4 THEN 'Champions'
    WHEN recency_score >= 3 AND frequency_score >= 4 THEN 'Loyal Customers'
    WHEN recency_score >= 4 AND frequency_score >= 2 THEN 'Potential Loyalists'
    WHEN recency_score >= 4 THEN 'Recent Customers'
    WHEN recency_score >= 3 AND monetary_score >= 2 THEN 'Promising'
    WHEN recency_score = 3 THEN 'Customers Needing Attention'
    WHEN recency_score <= 2 AND frequency_score >= 3 AND monetary_score >= 3 THEN 'At Risk'
    WHEN recency_score <= 2 AND frequency_score <= 2 THEN 'Hibernating'
    ELSE 'Lost'
  END AS rfm_segment,
  CASE
    WHEN recency_score >= 4 AND frequency_score >= 4 AND monetary_score >= 4 THEN 'Reward with VIP offers and early access'
    WHEN recency_score >= 3 AND frequency_score >= 4 THEN 'Upsell premium bundles and loyalty benefits'
    ...
  END AS recommended_action
FROM rfm_base;
```

{% endcode %}

What this model does:

* Reads from a single silver model (`silver.dim_customer_profile`) and scores each customer on recency, frequency, and monetary value.
* Derives the `rfm_score` (a three-digit string like `"555"`) and `rfm_segment` (a business label) using CASE expressions.
* Validates that each customer gets exactly one valid segment using `accepted_values` and the `rfm_score_consistency()` custom assertion.

***

## MODEL block properties

The `MODEL` block is the DDL for your model. Properties set here override `modelDefaults` from `config.yaml`.

### Required

| Property | Description                                                                             |
| -------- | --------------------------------------------------------------------------------------- |
| `name`   | Fully-qualified name: `schema.table_name`. Vulcan creates this table in your warehouse. |

### Scheduling and materialization

| Property | Description                                                                                                                 |
| -------- | --------------------------------------------------------------------------------------------------------------------------- |
| `kind`   | How the model materializes. Default: `VIEW`. See [Model kinds](/concepts/resources/vulcan/components/model/model_kinds.md). |
| `cron`   | Schedule for automatic runs (e.g. `@daily`, `*/15 * * * *`).                                                                |
| `start`  | Backfill start date for incremental models (e.g. `'2025-01-01'`).                                                           |

### Grain and keys

| Property | Description |
| -------- | ----------- |

\| `grains` | Columns that uniquely identify a row. Enables assertion checking and documents the primary key. |

### Documentation

| Property              | Description                                                                     |
| --------------------- | ------------------------------------------------------------------------------- |
| `description`         | Human-readable description. Appears in catalog search.                          |
| `column_descriptions` | Column-level documentation. Passed to the warehouse as column comments.         |
| `tags`                | Labels for search and filtering (e.g. `('silver', 'fact', 'daily-sales')`).     |
| `terms`               | Business glossary references using dot notation (e.g. `'sales.daily_metrics'`). |
| `owner`               | The person or team responsible for this model.                                  |

### Assertions

Attach data quality rules directly in the `MODEL` block. These run automatically on every materialization and block the model if they fail:

```sql
assertions (
  not_null(columns := (order_id, customer_id)),
  unique_values(columns := (order_id)),
  accepted_values(column := order_status, is_in := ('Confirmed', 'Shipped', 'Cancelled')),
  forall(criteria := (total_revenue >= 0))
)
```

See [Assertions](/build/stage-2-productize/define-the-contract/assertions.md) for the full list of built-in assertion functions.

***

## Writing the SELECT query

The `SELECT` query follows the `MODEL` block, separated by a semicolon. Write standard SQL for your target warehouse dialect.

**Reference another model by its fully-qualified name.** Vulcan detects the dependency automatically:

```sql
SELECT
  o.order_id,
  o.customer_id,
  c.region_id
FROM bronze.orders AS o
INNER JOIN bronze.customers AS c ON o.customer_id = c.customer_id
```

**Always cast output columns to explicit types.** This makes the schema predictable and prevents surprises during materialization:

```sql
SELECT
  COUNT(DISTINCT o.order_id)::INTEGER AS total_orders,
  SUM(oi.quantity * oi.unit_price)::NUMERIC(15, 2) AS total_revenue
FROM bronze.orders AS o
```

***

## Next steps

* See [Model kinds](/concepts/resources/vulcan/components/model/model_kinds.md) to choose the right materialization strategy.
* See [Macros](/build/stage-2-productize/define-models-and-logic/data-models/macros.md) to write reusable functions like `@safe_ratio` and `@revenue_order_filter`.
* See [Define the contract](/build/stage-2-productize/define-the-contract.md) to add tests and quality rules.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/build/stage-2-productize/define-models-and-logic/data-models/sql-models.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
