> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/build/stage-2-productize/define-models-and-logic/data-models/seeds.md).

# Seeds

Seed models load a static CSV file into a warehouse table. Use them for reference data that rarely changes: status lookups, regions, product categories, pricing tiers, or any small table you want to version in Git and join against in your models.

***

## How seeds work

You point a `SEED` model at a CSV file, declare the schema, and Vulcan loads the CSV into a table in your warehouse. The data is only reloaded when the model definition or the CSV content changes. There is no unnecessary reloading on every run.

Seeds are a lightweight alternative to creating external models for small, team-maintained reference tables.

***

## Worked example: order\_status\_lookup

In `orders-analytics`, the `order_status_lookup` seed classifies each valid order status into a group, a fulfillment flag, a sort order, and a description. Every model that validates or displays order status references this table.

### The CSV file

`seeds/order_status_lookup.csv`:

```csv
order_status,status_group,is_fulfilled,is_active_sort,description
Confirmed,Open,false,1,Order has been accepted but not yet shipped
Shipped,Fulfilled,true,2,Order has been shipped to the customer
Cancelled,Closed,false,3,Order was cancelled and excluded from revenue metrics
```

The file has a header row matching the column names you will declare in the `SEED` model. Keep seed CSVs small and easy to read. If your reference table has thousands of rows, use an external model instead.

### The model file

`models/seeds/order_status_lookup.sql`:

```sql
MODEL (
  name bronze.order_status_lookup,
  kind SEED (
    path '../../seeds/order_status_lookup.csv'
  ),
  columns (
    order_status TEXT,
    status_group TEXT,
    is_fulfilled BOOLEAN,
    is_active_sort INTEGER,
    description TEXT
  ),
  grains [order_status],
  owner 'johndoetmdcio',
  tags ('seed-data', 'bronze', 'lookup', 'order-status'),
  description 'Static lookup table for order status classification used by order analytics models.',
  assertions (
    unique_values(columns := (order_status)),
    not_null(columns := (order_status, status_group, is_fulfilled, is_active_sort, description)),
    accepted_values(column := order_status, is_in := ('Confirmed', 'Shipped', 'Cancelled'))
  )
);
```

Key things to note:

* `path` is relative to the `.sql` model file, not the project root. The model is in `models/seeds/` and the CSV is in `seeds/`, so the path goes up two levels with `../../`.
* `grains [order_status]` declares that each row is uniquely identified by the order status value.
* The three assertions run on every load: uniqueness check on the grain, null check on all columns, and a value check that only the three expected statuses exist.

### Referencing the seed in an assertion

The `order_status_lookup_consistency` assertion uses the seed to check that every status in `bronze.orders` exists in the lookup table:

```sql
AUDIT (
  name order_status_lookup_consistency,
  blocking true
);

SELECT
  orders.order_id,
  orders.order_status,
  'Order status is missing from bronze.order_status_lookup' AS issue_type
FROM @this_model AS orders
LEFT JOIN bronze.order_status_lookup AS lookup
  ON orders.order_status = lookup.order_status
WHERE lookup.order_status IS NULL;
```

This assertion is attached to `bronze.orders` via the `assertions` block. If a new status appears in the source data that is not in the CSV, the assertion fails and blocks the model from promoting bad data.

### Referencing the seed in another model

Once materialized, reference the seed by its fully-qualified name like any other model:

```sql
SELECT
  o.order_id,
  o.order_status,
  l.status_group,
  l.is_fulfilled
FROM bronze.orders AS o
LEFT JOIN bronze.order_status_lookup AS l
  ON o.order_status = l.order_status
```

Vulcan detects the dependency and ensures `bronze.order_status_lookup` is materialized before any model that joins against it.

***

## Defining a seed model

The general pattern for any seed:

```sql
MODEL (
  name <schema>.<table_name>,
  kind SEED (
    path '<relative-path-to-csv>'
  ),
  columns (
    <column_name> <SQL_TYPE>,
    ...
  ),
  grains [<grain_column>],
  owner '<owner>',
  description '<description>',
  assertions (
    unique_values(columns := (<grain_column>)),
    not_null(columns := (<required_columns>))
  )
);
```

The `columns` block defines the schema. Vulcan uses this to create the table and cast CSV values to the correct types.

***

## CSV file structure

The CSV must have a header row matching the column names declared in `columns`. Place CSV files in the `seeds/` folder of your project:

```
seeds/
└── order_status_lookup.csv
```

***

## When to use seeds vs external models

| Use seeds when                                         | Use external models when                               |
| ------------------------------------------------------ | ------------------------------------------------------ |
| The data is small (a few dozen to a few thousand rows) | The data is large or comes from an external system     |
| The team maintains the data manually                   | The data is owned and updated by an external system    |
| You want to version the data in Git                    | You want to read the table in-place without copying it |
| You need it loaded into the warehouse for joins        | The table is already in the warehouse                  |

***

## Tips

* Always declare `grains` for seeds so Vulcan can validate uniqueness.
* Add `not_null` and `unique_values` assertions to catch data entry errors early.
* The `path` in the `MODEL` block is relative to the `.sql` file location, not the project root.
* Add any custom assertion that uses the seed table to the models that depend on it (as in `order_status_lookup_consistency`). This way, if the seed is missing a status that appears in production data, the assertion blocks the model before bad data propagates.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/build/stage-2-productize/define-models-and-logic/data-models/seeds.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
