> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/build/stage-2-productize/configure-project.md).

# Configure project

`config.yaml` is the central configuration file for your Vulcan project. It tells Vulcan how to connect to your warehouse, what defaults to apply across all models, and how to run governance controls like hooks and notifications.

Every setting in this file applies project-wide. Individual models can override scheduling and kind, but connection settings, hooks, and linter rules apply to everything.

***

## usage.yaml

`usage.yaml` sits alongside `config.yaml` in the project root. It describes the business context of the data product: what it is good for, what it should not be used for, known caveats, and external references. The Data Product Hub reads this file and surfaces it to consumers.

`orders-analytics/usage.yaml`:

```yaml
good_for:
  - Daily and weekly sales performance reporting by region, category, customer, and product
  - title: Customer value and retention analysis
    details: Use customer profile and RFM semantic models to identify champions, at-risk customers, churned customers, and retention opportunities
  - Product revenue, unit velocity, and category performance analysis
  - Fulfillment conversion and shipment-rate monitoring by region

not_for:
  - Real-time order orchestration or shipment alerting
  - title: Financial close or statutory revenue reporting
    details: This data product uses synthetic generated demo data and excludes cancelled orders for analytics, not formal accounting
  - Inventory availability or warehouse capacity planning without additional source data

caveats:
  - Historical source data is generated by `infra/generate_orders_data.py` and can be regenerated with a different random seed
  - title: Demo refresh cadence
    details: Models are configured to refresh every 15 minutes for demonstration; generated sources are only updated when the loader is run
    severity: medium
  - title: Aggregated analytical models
    details: Silver and gold models are analytics-friendly aggregates and profiles, not raw operational event logs
    severity: low

references:
  - title: Vulcan book
    url: https://tmdc-io.github.io/vulcan-book/
    type: doc
```

| Field        | Required | Description                                                                                                              |
| ------------ | :------: | ------------------------------------------------------------------------------------------------------------------------ |
| `good_for`   |    No    | Use cases the data product is designed to support. Each entry is a plain string or an object with `title` and `details`. |
| `not_for`    |    No    | Use cases that are out of scope or would give misleading results.                                                        |
| `caveats`    |    No    | Known limitations consumers should be aware of. Objects can include `severity`: `low`, `medium`, or `high`.              |
| `references` |    No    | External links to documentation or related resources. Each entry takes `title`, `url`, and `type`.                       |

***

## config.yaml

{% tabs %}
{% tab title="Minimal" %}
The minimum required fields are a non-empty `name`, a non-empty `description`, at least one gateway with a working `connection`, and `modelDefaults.dialect`.

```yaml
name: my-data-product
description: My project description

gateways:
  default:
    connection:
      type: postgres
      host: localhost
      port: 5432
      database: mydb
      user: myuser
      password: mypass

modelDefaults:
  dialect: postgres
```

{% endtab %}

{% tab title="Full example (orders-analytics)" %}
The full `config.yaml` from the `orders-analytics` data product. Each section is explained below.

{% code overflow="wrap" %}

```yaml
vde: false
name: orders-analytics
displayName: Orders Analytics Platform
description: Governed e-commerce order analytics for revenue, customer segmentation, product performance, fulfillment, and sales funnel monitoring on PostgreSQL.
discoverable: true
version: 0.1.2
alignment: sourceAligned

tags:
  - e-commerce
  - orders
  - sales-analytics
  - customer-analytics
  - postgres

terms:
  - glossary.data_product
  - glossary.orders
  - glossary.revenue
  - glossary.customer_segmentation

domain: sales_operations

linter:
  enabled: true
  warnRules:
    - RequireGrainForAllModels
    - RequireOwnerForAllModels
    - RequireAssertionsOrAuditsForAllModels
    - RequireDqForAnalyticsModels
    - RequireGeneratedBronzeSources
    - preferassertions
    - nomissingaudits

modelDefaults:
  dialect: postgres
  start: '2025-01-01'
  cron: '*/15 * * * *'

beforeAll:
  - CREATE SCHEMA IF NOT EXISTS bronze;
  - CREATE SCHEMA IF NOT EXISTS silver;
  - CREATE SCHEMA IF NOT EXISTS gold;
  - GRANT USAGE ON SCHEMA bronze, silver, gold TO db_owner;
  - ALTER DEFAULT PRIVILEGES IN SCHEMA bronze GRANT SELECT ON TABLES TO db_owner;
  - ALTER DEFAULT PRIVILEGES IN SCHEMA silver GRANT SELECT ON TABLES TO db_owner;
  - ALTER DEFAULT PRIVILEGES IN SCHEMA gold GRANT SELECT ON TABLES TO db_owner;
  - SET statement_timeout = '120s';
  - SET lock_timeout = '30s';
  - SET idle_in_transaction_session_timeout = '60s';

afterAll:
  - ANALYZE bronze.orders;
  - ANALYZE bronze.order_items;
  - ANALYZE silver.fct_daily_sales;
  - RESET statement_timeout;
  - RESET lock_timeout;
  - RESET idle_in_transaction_session_timeout;

gateways:
  default:
    connection:
      type: depot
      address: dataos://postgresDepot

notificationTargets:
  - type: console
    notifyOn:
      - apply_start
      - apply_end
      - apply_failure
      - run_start
      - run_end
      - run_failure
      - audit_failure
      - check_start
      - check_end
      - check_failure
      - migration_start
      - migration_end
      - migration_failure
      - plan_change

users:
  - username: johndoetmdcio
    email: john.doe@tmdc.io
    type: OWNER
  - username: data_team
    email: data-team@tmdc.io
    type: CONTRIBUTOR

variables:
  environment: local
  bronze_schema: bronze
  silver_schema: silver
  gold_schema: gold
  min_customer_count: 10
  min_order_count: 20
```

{% endcode %}
{% endtab %}
{% endtabs %}

***

## Configuration sections

### Project identity and metadata

Identity fields appear in catalog search results and the Data Product Hub. They do not affect how Vulcan runs.

| Key            | Required | Description                                                                                                                           |
| -------------- | :------: | ------------------------------------------------------------------------------------------------------------------------------------- |
| `name`         |    Yes   | Project identifier. Can be overridden via `DATAOS_RESOURCE_NAME` env var.                                                             |
| `description`  |    Yes   | Project description. Must be non-empty.                                                                                               |
| `displayName`  |    No    | Human-readable name for UIs and docs.                                                                                                 |
| `discoverable` |    No    | Whether the product appears in catalog search. Default: `true`.                                                                       |
| `version`      |    No    | SemVer 2.0 release version (e.g. `0.1.2`).                                                                                            |
| `alignment`    |    No    | `sourceAligned` or `consumerAligned`. Orders-analytics uses `sourceAligned` because it is structured close to the operational source. |
| `tags`         |    No    | Labels for categorization and search.                                                                                                 |
| `terms`        |    No    | Business glossary terms using dot notation (e.g. `glossary.orders`).                                                                  |
| `domain`       |    No    | Business domain (e.g. `sales_operations`, `marketing`, `finance`).                                                                    |
| `vde`          |    No    | Set to `false` for Postgres and most engines. Only `true` for specific Vulcan deployment modes.                                       |

***

### Gateways and connections

The `gateways` block tells Vulcan where to connect. In `orders-analytics`, the connection type is `depot`, which means Vulcan delegates authentication to a DataOS depot instead of using raw credentials.

```yaml
gateways:
  default:
    connection:
      type: depot
      address: dataos://postgresDepot
```

When working locally (outside DataOS), connect directly using raw credentials and environment variables:

```yaml
gateways:
  default:
    connection:
      type: postgres
      host: "{{ env_var('PG_HOST') }}"
      port: 5432
      database: "{{ env_var('PG_DATABASE') }}"
      user: "{{ env_var('PG_USER') }}"
      password: "{{ env_var('PG_PASSWORD') }}"
    stateConnection:
      type: duckdb
      database: ./.state/vulcan.db
```

Always use `{{ env_var('VAR_NAME') }}` to pull credentials from the environment. Never write passwords directly into `config.yaml`.

| Key                               | Required | Description                                                                                                      |
| --------------------------------- | :------: | ---------------------------------------------------------------------------------------------------------------- |
| `gateways.<name>.connection`      |    Yes   | Primary warehouse connection.                                                                                    |
| `gateways.<name>.stateConnection` |    No    | Where Vulcan stores internal state. Defaults to `connection` if not set. For local development, point at DuckDB. |
| `gateways.<name>.testConnection`  |    No    | Connection for running unit tests. Defaults to DuckDB.                                                           |
| `defaultGateway`                  |    No    | Which gateway to use when none is specified.                                                                     |

***

### Model defaults

`modelDefaults` sets values that apply to all models unless a model overrides them individually. The `dialect` field is the only required field.

In `orders-analytics`, all models default to running every 15 minutes (`cron: '*/15 * * * *'`) and backfill from January 2025:

```yaml
modelDefaults:
  dialect: postgres
  start: '2025-01-01'
  cron: '*/15 * * * *'
```

Individual models can override these. For example, a gold model that runs expensive queries might use `cron: '@hourly'` instead.

| Key       | Required | Description                                                                 |
| --------- | :------: | --------------------------------------------------------------------------- |
| `dialect` |    Yes   | SQL dialect for models (postgres, snowflake, bigquery, trino, spark, etc.). |
| `start`   |    No    | Default start date for backfilling.                                         |
| `cron`    |    No    | Default cron schedule (e.g. `@daily`, `0 0 * * *`, `*/15 * * * *`).         |
| `kind`    |    No    | Default model kind. Defaults to `VIEW` if not set.                          |
| `owner`   |    No    | Default owner for all models.                                               |

***

### Execution hooks

Hooks run SQL statements before or after `vulcan plan` and `vulcan run`. Use `beforeAll` for setup (creating schemas, granting permissions, setting timeouts) and `afterAll` for cleanup.

The `orders-analytics` `beforeAll` block creates the three output schemas if they do not exist, grants read access to the database role, and sets query timeouts as a safety guard:

```yaml
beforeAll:
  - CREATE SCHEMA IF NOT EXISTS bronze;
  - CREATE SCHEMA IF NOT EXISTS silver;
  - CREATE SCHEMA IF NOT EXISTS gold;
  - GRANT USAGE ON SCHEMA bronze, silver, gold TO db_owner;
  - ALTER DEFAULT PRIVILEGES IN SCHEMA bronze GRANT SELECT ON TABLES TO db_owner;
  - ALTER DEFAULT PRIVILEGES IN SCHEMA silver GRANT SELECT ON TABLES TO db_owner;
  - ALTER DEFAULT PRIVILEGES IN SCHEMA gold GRANT SELECT ON TABLES TO db_owner;
  - SET statement_timeout = '120s';
  - SET lock_timeout = '30s';
  - SET idle_in_transaction_session_timeout = '60s';
```

The `afterAll` block runs `ANALYZE` on frequently queried tables to update Postgres query planner statistics, then resets the session timeouts:

```yaml
afterAll:
  - ANALYZE bronze.orders;
  - ANALYZE bronze.order_items;
  - ANALYZE silver.fct_daily_sales;
  - RESET statement_timeout;
  - RESET lock_timeout;
  - RESET idle_in_transaction_session_timeout;
```

Hooks run inside the same database session as the plan. If a hook fails, the plan stops.

***

### Linter

The linter catches project-level issues when you run `vulcan plan`. `orders-analytics` enables custom warn rules that enforce project conventions:

```yaml
linter:
  enabled: true
  warnRules:
    - RequireGrainForAllModels
    - RequireOwnerForAllModels
    - RequireAssertionsOrAuditsForAllModels
    - RequireDqForAnalyticsModels
    - RequireGeneratedBronzeSources
    - preferassertions
    - nomissingaudits
```

`warnRules` produce warnings but do not fail the plan. Use them to enforce team conventions without blocking development.

To learn more about built-in rules, custom rule classes, and how to write your own, see the [Linter reference](/concepts/resources/vulcan/configurations/options/linter.md).

***

### Notifications

Set up alerts when plans start, runs complete, or assertions fail. `orders-analytics` sends all events to the console:

```yaml
notificationTargets:
  - type: console
    notifyOn:
      - apply_start
      - apply_end
      - apply_failure
      - run_start
      - run_end
      - run_failure
      - audit_failure
      - check_start
      - check_end
      - check_failure
      - migration_start
      - migration_end
      - migration_failure
      - plan_change
```

To send notifications to Slack instead:

```yaml
notificationTargets:
  - type: slack
    url: "{{ env_var('SLACK_WEBHOOK_URL') }}"
    notifyOn:
      - run_end
      - audit_failure
      - plan_change
```

***

### Users

Declare the owner and contributors for the data product. These appear in the Data Product Hub and are used for contact and access control:

```yaml
users:
  - username: johndoetmdcio
    email: john.doe@tmdc.io
    type: OWNER
  - username: data_team
    email: data-team@tmdc.io
    type: CONTRIBUTOR
```

{% hint style="warning" %}
**`users` is required.** Config load fails if the list is empty or the key is omitted entirely. At least one user must be declared.
{% endhint %}

**Owner enforcement rules that apply at config load time:**

* **`model_defaults.owner` default.** If you do not set `model_defaults.owner`, it automatically defaults to the `username` of the first entry in `config.users` (in the example above, `johndoetmdcio`).
* **`model_defaults.owner` must be a listed user.** If you do set `model_defaults.owner` explicitly, it must match a `username` in `config.users`. Config load fails if it does not.
* **Per-model `owner` validation.** Each model's `owner` field is checked against `config.users` at load time. A model that names an owner not listed here is rejected.

| Field      | Required | Description                                                                         |
| ---------- | :------: | ----------------------------------------------------------------------------------- |
| `username` |    Yes   | DataOS username. Must be a valid user in the current tenant.                        |
| `email`    |    No    | Email address used for notifications.                                               |
| `type`     |    No    | Role: `OWNER` or `CONTRIBUTOR`. The first entry is used as the default model owner. |

**`noMissingDataOSUsername` linter rule.** Enable this built-in project-level rule to validate that every `username` in `config.users` is a recognized DataOS user in the current tenant. It runs once per project during `lint_models`:

```yaml
linter:
  enabled: true
  rules:
    - nomissingdataosusername
```

See the [Linter reference](/concepts/resources/vulcan/configurations/options/linter.md) for details on project-level rules.

***

### Variables

Project-level variables can be referenced in models and macros using `{{ var('variable_name') }}`. `orders-analytics` uses variables for schema names and threshold values:

```yaml
variables:
  environment: local
  bronze_schema: bronze
  silver_schema: silver
  gold_schema: gold
  min_customer_count: 10
  min_order_count: 20
```

A model or macro can then use:

```sql
WHERE total_customers >= {{ var('min_customer_count') }}
```

***

## Environment variables

A few values come from the shell or `.env` file, not from YAML:

| Variable               | Effect                                                      |
| ---------------------- | ----------------------------------------------------------- |
| `DATAOS_TENANT_ID`     | Required at runtime. Supplies the `tenant`. Not a YAML key. |
| `DATAOS_RESOURCE_NAME` | Overrides `name` from `config.yaml`.                        |
| `DATAOS_RESOURCE_TAGS` | Merged into `tags` from `config.yaml`.                      |

***

## Validation rules

Some fields become required only when another field is enabled:

* `name` must be non-empty.
* `description` must be non-empty.
* `users` must contain at least one entry.
* `users[].username` values must be valid DataOS usernames in the current tenant.
* `model_defaults.owner`, when set explicitly, must match a `username` in `config.users`.
* Each model's `owner` field is validated against `config.users` at load time.
* `domain` is required when set; it must be a non-empty string.
* `version` must be valid SemVer 2.0 (e.g. `0.1.2`, not `v0.1.2`).
* `vde: true` is rejected for `spark` and `trino` gateways.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/build/stage-2-productize/configure-project.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
