> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/build/stage-2-productize/connect-to-engine/trino/external-trino.md).

# External Trino

Use this when you already have a Trino cluster running and want Vulcan to connect to it. Vulcan connects to the existing cluster; it does not create a new one.

This covers the following Trino-compatible systems:

* DataOS Minerva
* Starburst
* Self-hosted Trino
* Any other hosted Trino-compatible endpoint

***

### Core settings

| Setting                 | Value                       |
| ----------------------- | --------------------------- |
| Gateway connection type | `trino`                     |
| Model dialect           | `trino`                     |
| VDE                     | `false`                     |
| Scheduler               | Local or built-in scheduler |

Keep `vde: false` for all Trino connections. Trino gateways do not support VDE mode.

***

### Before you start

Gather these details before writing any configuration:

* The host address and port of your Trino cluster.
* A Trino username.
* A password, token, or API key for that user.
* The name of the catalog you want to use, for example `s3depot`, `hive`, `iceberg`, or `delta`.
* Network access from where Vulcan runs to the Trino cluster.
* The Trino user must be able to create schemas, tables, and views in that catalog.

***

### Required permissions

Your Trino user needs these permissions on the target catalog:

| Permission                                       | Why it is needed                                             |
| ------------------------------------------------ | ------------------------------------------------------------ |
| `CREATE SCHEMA` on the target catalog            | Vulcan creates schemas to store model output                 |
| `CREATE TABLE` and `CREATE VIEW` on schemas      | Vulcan creates tables and views when building models         |
| `SELECT`, `INSERT`, `UPDATE`, `DELETE` on tables | Vulcan reads and writes model data                           |
| `DROP TABLE`                                     | Vulcan cleans up tables during development and full rebuilds |

***

### Connection options

These fields go under `gateways.<name>.connection` in your `config.yaml`.

| Option                  | Description                                                                                                         | Required |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------- | :------: |
| `type`                  | Must be `trino`.                                                                                                    |    Yes   |
| `host`                  | The Trino cluster host address. Do not include `http://` or `https://`.                                             |    Yes   |
| `user`                  | The Trino username. For Starburst Galaxy, this may include a role suffix.                                           |    Yes   |
| `catalog`               | The default Trino catalog Vulcan should use.                                                                        |    Yes   |
| `port`                  | The Trino cluster port. Minerva commonly uses `7432`.                                                               |    No    |
| `httpScheme`            | `http` or `https`. Always use `https` in production.                                                                |    No    |
| `method`                | How to authenticate. Options: `basic`, `ldap`, `jwt`, `kerberos`, `certificate`, `oauth`, or `no-auth`.             |    No    |
| `password`              | Password or token for password-based auth.                                                                          |    No    |
| `roles`                 | Maps catalogs to roles, for role-based access control.                                                              |    No    |
| `httpHeaders`           | Extra HTTP headers to send with every request.                                                                      |    No    |
| `sessionProperties`     | Trino session properties to apply on each connection.                                                               |    No    |
| `retries`               | How many times to retry a failed request.                                                                           |    No    |
| `timezone`              | Timezone for the connection.                                                                                        |    No    |
| `schemaLocationMapping` | Maps a regex pattern to a storage location. Use this when the catalog cannot figure out where to store new schemas. |    No    |
| `catalogTypeOverrides`  | Tells Vulcan the connector type for a specific catalog, for example `iceberg`, `hive`, or `delta_lake`.             |    No    |

***

### Minimal config.yaml

This is the minimal working configuration. Credentials are passed through environment variables instead of being hardcoded.

```yaml
vde: false

gateways:
  default:
    connection:
      type: trino
      host: "{{ env_var('TRINO_HOST') }}"
      port: "{{ env_var('TRINO_PORT', '8080') }}"
      user: "{{ env_var('TRINO_USER') }}"
      catalog: "{{ env_var('TRINO_CATALOG') }}"
      httpScheme: "{{ env_var('TRINO_HTTP_SCHEME', 'https') }}"
      method: "{{ env_var('TRINO_METHOD', 'basic') }}"
      password: "{{ env_var('TRINO_PASSWORD') }}"
      verify: true

defaultGateway: default

modelDefaults:
  dialect: trino
  start: "2024-01-01"
```

***

### DataOS Minerva

Minerva is DataOS's Trino-based query engine. Vulcan connects to it using `type: trino`, the same as any other Trino endpoint.

#### Minerva prerequisites

Have these ready before you start:

* A Minerva cluster already running in your DataOS tenant.
* The Minerva cluster name, for example `minervainfinity`.
* The Minerva host, for example `tcp.<env-name>.dataos.cloud`.
* The Minerva port, commonly `7432`.
* Your DataOS user ID.
* A DataOS API key.
* Your DataOS tenant name.
* The catalog name you want to query, for example `s3depot`.
* A DataOS secret that holds your user ID and a generated Minerva password (covered below).

#### Check if the cluster exists

List all Minerva clusters in your environment:

```bash
dataos-ctl resource get -t minerva -a
```

Inspect a specific cluster by name:

```bash
dataos-ctl resource get -t minerva -n <minerva-cluster-name>
```

Save the cluster name. You will need it when generating the Minerva password.

#### Example Minerva resource

A Minerva cluster resource looks like this:

```yaml
version: v1alpha
type: minerva
name: ${MINERVA_CLUSTER_NAME}
tags:
  - minerva
spec:
  coordinator:
    replicas: 1
    envs:
      JVM__opts: "--add-opens=java.base/java.nio=ALL-UNNAMED"
    resources:
      requests:
        cpu: "1"
        memory: "1Gi"
      limits:
        cpu: "2"
        memory: "4Gi"
  worker:
    replicas: 2
    envs:
      JVM__opts: "--add-opens=java.base/java.nio=ALL-UNNAMED"
    resources:
      requests:
        cpu: "1"
        memory: "1Gi"
      limits:
        cpu: "2"
        memory: "4Gi"
  compute: ${MINERVA_COMPUTE_NAME}
  depots:
    - address: "dataos://postgres?purpose=rw"
```

The `depots` list controls which DataOS data sources the Minerva cluster can access.

#### Generate the Minerva password

Minerva uses a base64-encoded JSON object as the password. It tells Minerva which cluster to connect to and which user is authenticating.

The JSON has three fields:

| Field     | What to put here          |
| --------- | ------------------------- |
| `cluster` | Your Minerva cluster name |
| `apikey`  | Your DataOS API key       |
| `tenant`  | Your DataOS tenant name   |

Run the command for your operating system to generate the encoded password:

{% tabs %}
{% tab title="macOS or Linux" %}

```bash
echo -n '{"cluster":"<minerva-cluster-name>","apikey":"<dataos-api-key>","tenant":"<tenant-name>"}' | base64 | tr -d '\n'; echo
```

{% endtab %}

{% tab title="Windows PowerShell" %}

```powershell
$json = '{"cluster":"<minerva-cluster-name>","apikey":"<dataos-api-key>","tenant":"<tenant-name>"}'
[Convert]::ToBase64String([Text.Encoding]::UTF8.GetBytes($json))
```

{% endtab %}
{% endtabs %}

Example with real-looking values:

```bash
echo -n '{"cluster":"minervainfinity","apikey":"abc123","tenant":"qa"}' | base64 | tr -d '\n'; echo
```

Save the output. This is your `PASSWORD` value for the DataOS secret below.

#### Minerva secret

Store your user ID and the generated password in a DataOS secret. Vulcan reads these values at runtime so credentials are not hardcoded in the config file.

```yaml
name: trino-connection-secret
version: v2alpha
type: secret
layer: user
secret:
  type: key-value
  data:
    USER_ID: "<dataos-user-id>"
    PASSWORD: "<base64-json-password>"
```

Apply the secret:

```bash
dataos-ctl resource apply -f trino-connection-secret.yaml
```

Verify it was created successfully:

```bash
dataos-ctl resource get -t secret -n trino-connection-secret
```

#### Map secret values into Vulcan

The secret values are projected into environment variables. The `config.yaml` then reads those variables at runtime.

```yaml
use:
  projection:
    secrets:
      - contextAlias: trn
        id: <tenant>:trino-connection-secret
    projections:
      envVars:
        - key: TRINO_USER
          template: "{{ secrets['trn'].USER_ID | base64_decode }}"
        - key: TRINO_PASSWORD
          template: "{{ secrets['trn'].PASSWORD | base64_decode }}"
```

Each secret key maps to an environment variable and a config field:

| Secret key | Becomes this env variable | Used in config.yaml as                        |
| ---------- | ------------------------- | --------------------------------------------- |
| `USER_ID`  | `TRINO_USER`              | `user: "{{ env_var('TRINO_USER') }}"`         |
| `PASSWORD` | `TRINO_PASSWORD`          | `password: "{{ env_var('TRINO_PASSWORD') }}"` |

#### Minerva config.yaml

```yaml
vde: false

gateways:
  default:
    connection:
      type: trino
      host: "tcp.<env-name>.dataos.cloud"
      port: 7432
      user: "{{ env_var('TRINO_USER') }}"
      catalog: "s3depot"
      httpScheme: https
      method: basic
      password: "{{ env_var('TRINO_PASSWORD') }}"
      verify: true

defaultGateway: default

modelDefaults:
  dialect: trino
  start: "2024-01-01"
```

***

### Starburst

Starburst uses the same connection setup as any other external Trino endpoint. Check whether your username needs a role suffix.

Some Starburst Galaxy setups require a suffix like `analyst@example.com/analyst`. Check with your Starburst admin if you are unsure.

```yaml
user: "{{ env_var('STARBURST_USER') }}"
```

Starburst commonly uses these connection settings:

```yaml
httpScheme: https
port: 443
method: basic
```

***

### Self-hosted Trino

For a Trino cluster you manage yourself, confirm these four things before writing your config:

* Does the endpoint use `http` or `https`?
* Which authentication method is enabled on the cluster?
* Which catalog should Vulcan use?
* Can the catalog figure out where to store new tables, or does it need explicit locations?

If the catalog cannot infer storage locations, add `schemaLocationMapping` to tell Vulcan where to put new schemas:

```yaml
gateways:
  default:
    connection:
      type: trino
      host: "{{ env_var('TRINO_HOST') }}"
      user: "{{ env_var('TRINO_USER') }}"
      catalog: "{{ env_var('TRINO_CATALOG') }}"
      schemaLocationMapping:
        ".*": "s3://warehouse/vulcan/@{schema_name}"
```

***

### Catalog and storage notes

Trino behavior depends on which catalog connector you use:

* Hive, Iceberg, and Delta Lake catalogs work well with Vulcan.
* Some catalogs automatically figure out where to store new tables based on a warehouse location.
* If your catalog has no default location, add `schemaLocationMapping` to your connection config.
* If Vulcan cannot tell whether a catalog is `hive`, `iceberg`, or `delta_lake`, use `catalogTypeOverrides` to set it explicitly.

Example:

```yaml
gateways:
  default:
    connection:
      type: trino
      catalogTypeOverrides:
        datalake: iceberg
        analytics: hive
```

***

### Materialization behavior

How Vulcan writes data to Trino depends on the model kind you use:

| Model kind                  | How Vulcan writes data                                     |
| --------------------------- | ---------------------------------------------------------- |
| `INCREMENTAL_BY_TIME_RANGE` | Deletes rows in the time range, then inserts new ones      |
| `INCREMENTAL_BY_UNIQUE_KEY` | Merges rows based on a unique key (catalog support varies) |
| `INCREMENTAL_BY_PARTITION`  | Deletes rows in the partition, then inserts new ones       |
| `FULL`                      | Replaces the entire table                                  |

***

### Quick checklist

Before running `vulcan plan`, confirm:

* `vde: false` is set in your config.
* `connection.type` is `trino`.
* `modelDefaults.dialect` is `trino`.
* Host, port, user, catalog, and password or token are all set.
* The Trino user has permission to create schemas, tables, and views in the target catalog.
* For Minerva: the DataOS secret is applied and the values are mapped to environment variables.

***

### Next steps

After configuring Trino, continue with:

```
Connect to Engine -> Define models -> Validate and test locally
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/build/stage-2-productize/connect-to-engine/trino/external-trino.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.