> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/build/stage-2-productize/connect-to-engine.md).

# Connect to engine

Before you can run `vulcan plan` or materialize any models, Vulcan needs to know which warehouse engine to connect to and how to authenticate.

Engine configuration lives inside the `gateways` block of `config.yaml`. Each gateway entry holds a `connection` object whose `type` field determines which engine adapter Vulcan uses.

***

## How engine selection works

When you run `vulcan init`, the CLI scaffolds a minimal `config.yaml` with a default gateway. You fill in the connection parameters for your engine, then run `vulcan plan dev` to verify the connection.

The `type` field inside the `connection` block selects the engine adapter:

```yaml
gateways:
  default:
    connection:
      type: <engine-type>   # depot | snowflake | postgres | trino | spark
      # ...engine-specific fields
```

### Connecting via DataOS Depot (recommended for production)

When deploying on DataOS, use `type: depot` instead of raw credentials. This delegates connection management to a DataOS depot, which handles credentials securely without exposing them in `config.yaml`.

The `orders-analytics` data product uses this pattern for its PostgreSQL database:

```yaml
gateways:
  default:
    connection:
      type: depot
      address: dataos://postgresDepot
```

The depot (`postgresDepot`) is registered separately in DataOS and holds the host, port, database name, and credentials. Your project only needs the depot address.

### Connecting with raw credentials (for local development)

When working locally before deploying to DataOS, connect directly using environment variables. The `orders-analytics` local setup reads credentials from a `.env` file:

```yaml
gateways:
  default:
    connection:
      type: postgres
      host: "{{ env_var('PG_HOST') }}"
      port: 5432
      database: "{{ env_var('PG_DATABASE') }}"
      user: "{{ env_var('PG_USER') }}"
      password: "{{ env_var('PG_PASSWORD') }}"
      sslmode: require
    stateConnection:
      type: duckdb
      database: ./.state/vulcan.db

defaultGateway: default

modelDefaults:
  dialect: postgres
```

The `stateConnection` points at DuckDB for local state storage. Vulcan uses this to track which model intervals have been processed. In production (on DataOS), the depot handles state automatically.

Never write raw credentials directly in `config.yaml`. Always use `{{ env_var('VAR_NAME') }}` to pull them from the environment.

***

{% stepper %}
{% step %}

## Set Up Your Engine

Choose the tab for your engine. If you already have a warehouse or engine instance, use it and add its connection details to `config.yaml`. If you do not have one, use the local setup only where this guide provides one.

{% tabs %}
{% tab title="Postgres" %}
**Use an existing Postgres instance:**

Use an existing Postgres instance if you already have one. You need the host, port, database, user, and password. See the [Postgres connection options](/build/stage-2-productize/connect-to-engine/postgres.md#required-connection-options) for all supported fields.

Set your password as an environment variable:

{% tabs %}
{% tab title="Mac/Linux" %}

```bash
export POSTGRES_PASSWORD='your_password'
```

{% endtab %}

{% tab title="Windows" %}

```powershell
$env:POSTGRES_PASSWORD = 'your_password'
```

{% endtab %}
{% endtabs %}

Usse this connection in `config.yaml`:

```yaml
gateways:
  default:
    connection:
      type: postgres
      host: localhost
      port: 5433
      database: warehouse
      user: vulcan
      password: "{{ env_var(POSTGRES_PASSWORD) }}"
model_defaults:
  dialect: postgres
```

**Start Postgres locally with Docker:**

Create the network once:

```bash
docker network create vulcan
```

Save this as `docker/docker-compose.warehouse.yml`:

```yaml
volumes:
  warehouse:
    driver: local
networks:
  vulcan:
    external: true
services:
  warehouse:
    image: postgres:17-alpine
    environment:
      POSTGRES_DB: warehouse
      POSTGRES_USER: vulcan
      POSTGRES_PASSWORD: vulcan
      POSTGRES_HOST_AUTH_METHOD: trust
    ports:
      - "5433:5432"
    volumes:
      - warehouse:/var/lib/postgresql/data
    networks:
      - vulcan
```

Start it:

```bash
docker compose -f docker/docker-compose.warehouse.yml up -d
```

Full reference: [Connect engine → Postgres](/build/stage-2-productize/connect-to-engine/postgres.md).
{% endtab %}

{% tab title="Snowflake" %}
**Use an existing Snowflake instance:**

Use an existing Snowflake account and warehouse. No local Docker service is needed for Snowflake.

Set your password as an environment variable:

{% tabs %}
{% tab title="Mac/Linux" %}

```bash
export SNOWFLAKE_PASSWORD='your_password'
```

{% endtab %}

{% tab title="Windows" %}

```powershell
$env:SNOWFLAKE_PASSWORD = 'your_password'
```

{% endtab %}
{% endtabs %}

Use this connection in `config.yaml`:

```yaml
gateways:
  default:
    connection:
      type: snowflake
      account: your_account
      user: your_user
      password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
      warehouse: your_warehouse
      database: your_database
      role: your_role
model_defaults:
  dialect: snowflake
```

Full reference: [Connect engine → Snowflake](/build/stage-2-productize/connect-to-engine/snowflake.md).
{% endtab %}

{% tab title="Databricks" %}
**Use an existing Databricks workspace:**

Use an existing workspace with SQL warehouse or cluster access. No local Docker service is needed for Databricks.

Set the access token as an environment variable, then:

{% tabs %}
{% tab title="Mac/Linux" %}

```bash
export DATABRICKS_TOKEN='your_token'
```

{% endtab %}

{% tab title="Windows" %}

```powershell
$env:DATABRICKS_TOKEN = 'your_token'
```

{% endtab %}
{% endtabs %}

Use this connection in `config.yaml`:

```yaml
gateways:
  default:
    connection:
      type: databricks
      server_hostname: your-workspace.azuredatabricks.net
      http_path: /sql/1.0/warehouses/your_warehouse_id
      access_token: "{{ env_var('DATABRICKS_TOKEN') }}"
      catalog: your_catalog

model_defaults:
  dialect: databricks
```

Full reference: [Connect engine → Databricks](/build/stage-2-productize/connect-to-engine/databricks.md).
{% endtab %}

{% tab title="Spark" %}
**Use an existing Spark cluster:**

If you already have a Spark cluster, use the `vulcan-cli` service below and update `spark.master` in `config.yaml` to point to your cluster.

Use this connection in `config.yaml`:

{% code overflow="wrap" %}

```yaml
gateways:
  default:
    connection:
      type: spark
      config:
        spark.master: spark://spark-master:7077
        spark.app.name: vulcan
        spark.sql.catalog.local: org.apache.iceberg.spark.SparkCatalog
        spark.sql.catalog.local.type: rest
        spark.sql.catalog.local.uri: http://iceberg-rest:8181
        spark.sql.catalog.local.warehouse: s3://warehouse/
        spark.sql.catalog.local.io-impl: org.apache.iceberg.aws.s3.S3FileIO
        spark.sql.catalog.local.s3.endpoint: http://minio:9000
        spark.sql.catalog.local.s3.path-style-access: "true"
        spark.hadoop.fs.s3a.access.key: admin
        spark.hadoop.fs.s3a.secret.key: password
        spark.hadoop.fs.s3a.endpoint: http://minio:9000
        spark.hadoop.fs.s3a.path.style.access: "true"

model_defaults:
  dialect: spark2
```

{% endcode %}

**Start Spark locally with Docker:**

Start a local Spark standalone cluster with MinIO and an Iceberg REST catalog.

This avoids Windows Hadoop or `winutils.exe` issues because the Spark driver runs inside Linux.

Place `vulcan-0.228.1.25-py3-none-any.whl` in your project root, then save this as `docker/docker-compose.spark.yml`:

{% code overflow="wrap" expandable="true" %}

```yaml
services:
  # Spark standalone cluster for running Spark executors in containers.
  spark-master:
    image: tmdcio/vulcan-spark-base:0.228.1.21
    container_name: spark-seeds-minimal-spark-master
    restart: unless-stopped
    command: ["/bin/bash", "-lc", "/opt/spark/sbin/start-master.sh --host 0.0.0.0 --port 7077 --webui-port 8080 && tail -f /opt/spark/logs/*"]
    ports:
      - "7077:7077"
      - "8080:8080"
    networks:
      - spark-seeds-minimal-net

  spark-worker:
    image: tmdcio/vulcan-spark-base:0.228.1.21
    container_name: spark-seeds-minimal-spark-worker
    restart: unless-stopped
    command: ["/bin/bash", "-lc", "/opt/spark/sbin/start-worker.sh spark://spark-master:7077 --webui-port 8081 && tail -f /opt/spark/logs/*"]
    depends_on:
      - spark-master
    ports:
      - "8081:8081"
    networks:
      - spark-seeds-minimal-net

  # MinIO for S3-compatible storage.
  minio:
    image: minio/minio:latest
    container_name: spark-seeds-minimal-minio
    restart: unless-stopped
    environment:
      - MINIO_ROOT_USER=admin
      - MINIO_ROOT_PASSWORD=password
      - MINIO_DOMAIN=minio
    ports:
      - "9000:9000"
      - "9001:9001"
    networks:
      spark-seeds-minimal-net:
        aliases:
          - minio
          - warehouse.minio
    volumes:
      - minio_data:/data
    command: server /data --console-address ":9001"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 5s
      timeout: 5s
      retries: 10

  # MinIO setup - creates warehouse bucket.
  mc:
    image: minio/mc:latest
    container_name: spark-seeds-minimal-mc
    networks:
      - spark-seeds-minimal-net
    depends_on:
      minio:
        condition: service_healthy
    entrypoint: >
      /bin/sh -c "
        mc alias set minio http://minio:9000 admin password;
        mc mb --ignore-existing minio/warehouse;
        mc anonymous set public minio/warehouse;
        exit 0;
      "

  # Iceberg REST Catalog.
  iceberg-rest:
    image: tabulario/iceberg-rest:latest
    container_name: spark-seeds-minimal-iceberg-rest
    restart: unless-stopped
    ports:
      - "8181:8181"
    networks:
      - spark-seeds-minimal-net
    environment:
      - AWS_ACCESS_KEY_ID=admin
      - AWS_SECRET_ACCESS_KEY=password
      - AWS_REGION=us-east-1
      - CATALOG_WAREHOUSE=s3://warehouse/
      - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
      - CATALOG_S3_ENDPOINT=http://minio:9000
    depends_on:
      minio:
        condition: service_healthy

networks:
  spark-seeds-minimal-net:
    driver: bridge

volumes:
  minio_data:
```

{% endcode %}

Start the Spark services:

```bash
docker compose -f docker/docker-compose.spark.yml up -d
```

Verify Vulcan through the CLI container:

```bash
docker compose -f docker/docker-compose.spark.yml run --rm vulcan-cli vulcan --version
```

Full reference: [Connect engine → Spark](/build/stage-2-productize/connect-to-engine/spark.md).
{% endtab %}

{% tab title="Trino" %}
**Use an existing Trino cluster:**

Use an existing Trino cluster with a configured catalog. No local Docker service is needed for Trino. Set the password only if your cluster requires it, then:

{% tabs %}
{% tab title="Mac/Linux" %}

```bash
export TRINO_PASSWORD='your_password'
```

{% endtab %}

{% tab title="Windows" %}

```powershell
$env:TRINO_PASSWORD = 'your_password'
```

{% endtab %}
{% endtabs %}

Use this connection in `config.yaml`:

```yaml
gateways:
  default:
    connection:
      type: trino
      host: your_trino_host
      port: 8080
      user: your_user
      catalog: your_catalog
      http_scheme: https
      password: "{{ env_var('TRINO_PASSWORD') }}"

model_defaults:
  dialect: trino
```

Full reference: [Connect engine → Trino](/build/stage-2-productize/connect-to-engine/trino.md).
{% endtab %}
{% endtabs %}
{% endstep %}

{% step %}

## Initialize the Vulcan Project

Run the initializer from your activated virtual environment:

```bash
vulcan init
```

The initializer creates a starter project structure. The `orders-analytics` project was expanded from this scaffold into a full medallion pipeline:

```
orders-analytics/
├── config.yaml
├── usage.yaml
├── external_models.yaml
├── audits/
├── dq/
├── macros/
│   └── __init__.py
├── models/
│   ├── bronze/
│   ├── silver/
│   ├── gold/
│   ├── seeds/
│   ├── semantics/
│   └── metrics/
├── seeds/
│   └── order_status_lookup.csv
├── linter/
│   └── linters.py
└── tests/
```

You are not required to use this exact layout, but organizing models by layer (bronze / silver / gold) makes dependencies easy to follow as the project grows.
{% endstep %}

{% step %}

## Verify the connection

After filling in the connection parameters, verify Vulcan can reach the engine:

```bash
vulcan info
```

A successful output confirms:

* `config.yaml` is valid and all required fields are set.
* The warehouse connection is reachable.
* The model directory structure is recognized.

Fix any errors shown before running the plan. Then run a quick plan to confirm models can be evaluated:

```bash
vulcan plan dev
```

{% endstep %}
{% endstepper %}

## Supported engines

`orders-analytics` uses Postgres. The table below lists all engines Vulcan supports. For engines other than Postgres, see [the engine configuration reference](/concepts/resources/vulcan/configurations/engines.md) in the Concepts.

<table><thead><tr><th width="152.51904296875">Engine</th><th width="141.682861328125">type value</th><th align="center">Used by orders-analytics</th><th>Best for</th></tr></thead><tbody><tr><td>Postgres</td><td><code>postgres</code></td><td align="center">Yes</td><td>Smaller projects, development environments, full data control</td></tr><tr><td>Snowflake</td><td><code>snowflake</code></td><td align="center">No</td><td>Enterprise analytics, large-scale transformation</td></tr><tr><td>Trino</td><td><code>trino</code></td><td align="center">No</td><td>Interactive analytics, data lakes (Iceberg, Hive, Delta)</td></tr><tr><td>Spark</td><td><code>spark</code></td><td align="center">No</td><td>Large-scale batch processing, distributed compute</td></tr></tbody></table>

***

## Passing credentials securely

Never hard-code passwords in `config.yaml`. Use `{{ env_var('VAR_NAME') }}` to pull credentials from the environment:

```yaml
connection:
  type: postgres
  host: warehouse
  port: 5432
  database: warehouse
  user: vulcan
  password: "{{ env_var('DB_PASSWORD') }}"
```

Set the environment variable before running any Vulcan command:

```bash
export DB_PASSWORD=my-secret-password
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/build/stage-2-productize/connect-to-engine.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Engine	type value	Used by orders-analytics	Best for
Postgres	`postgres`	Yes	Smaller projects, development environments, full data control
Snowflake	`snowflake`	No	Enterprise analytics, large-scale transformation
Trino	`trino`	No	Interactive analytics, data lakes (Iceberg, Hive, Delta)
Spark	`spark`	No	Large-scale batch processing, distributed compute