> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/build/stage-2-productize/connect-to-engine/trino/managed-trino.md).

# Managed Trino

Managed Trino attaches a Trino cluster to the data product deployment. Vulcan provisions and owns the coordinator and workers, and generates catalogs from your depots. You do not manage the cluster separately from the data product.

For the full reference covering local development, cluster configuration, failure modes, and performance tuning, see the Trino engine manual.

***

### How it works

One `vulcan-dg-trino` resource creates four cooperating resources:

```
<name>-trino           (vulcan)    Trino coordinator (replicas: 1)
<name>-trino-workers   (vulcan)    Trino workers (spec.trino.workers.replicas)
<name>-plan            (workflow)  vulcan plan --auto-apply (waits for cluster ready)
<name>-run             (workflow)  vulcan run (scheduled; depends on plan)
```

Catalogs are generated automatically. One Trino catalog is created per depot in `spec.depots[]`, and one per secret catalog entry if you define any.

***

### Catalog priority

The default catalog is used as the gateway catalog. It determines where unqualified model names resolve. Catalogs come from two sources: depots (`spec.depots[]`) and secret catalogs (`spec.trino.catalog.config[]`).

| Catalogs present               | Default catalog                                           |
| ------------------------------ | --------------------------------------------------------- |
| Depot catalogs only            | First depot in `spec.depots[]`                            |
| Secret catalogs only           | First secret catalog in `spec.trino.catalog.config[]`     |
| Both depot and secret catalogs | First secret catalog (takes priority over depot catalogs) |

Secret catalogs outrank depot catalogs. If you have both, the first secret catalog becomes the default, not the first depot. Order each list deliberately.

Always use fully-qualified three-part names (`catalog.schema.name`) in every model, semantic, DQ, and source reference. This makes the materialization target explicit and removes any dependency on default catalog resolution.

***

### Prerequisites

Request the following from your DataOS platform team before you start:

* [ ] A Trino-capable compute pool. Note the name with `dataos-ctl resource -t compute get -a`.
* [ ] The tenant-level `vulcan` stack is installed.
* [ ] One depot per source you read (Postgres, Snowflake, lakehouse). Each becomes a catalog.
* [ ] The first depot is a writable lakehouse depot. This is the materialization target.
* [ ] Tenant secrets exist: `vulcan-state-connection` (Postgres) and `vulcan-object-store-connection` (S3).
* [ ] A git-sync secret for the repo holding your model code.
* [ ] Python 3.10 locally. This is required for the Vulcan CLI only.

Verify your depots are visible:

```bash
dataos-ctl get depot
```

Every depot you plan to mount should appear in the output.

***

### Project layout

A minimal managed Trino project:

```
my-trino-dp/
├── config.yaml
├── trino-server-deploy.yaml
├── models/
│   ├── bronze/orders.sql
│   └── semantics/orders.yml
└── dq/orders.yml
```

***

### config.yaml

The gateway connection is templated from environment variables the stack injects at runtime. Keep it generic so the same file works in both local development and production.

```yaml
name: my-trino-dp
display_name: "My Managed Trino DP"
tenant: "<tenant>"
description: "Managed Trino data product."
version: "0.1.0"

gateways:
  default:
    connection:
      type: trino
      catalog: "{{ env_var('TRINO_CATALOG') }}"
default_gateway: default

model_defaults:
  dialect: trino
  start: <YYYY-MM-DD>
  cron: "@daily"

linter:
  enabled: true
  rules:
    - ambiguousorinvalidcolumn
    - invalidselectstarexpansion
    - noambiguousprojections

ignore_patterns:
  - "trino-server-deploy.yaml"
```

Add your deploy manifest filename to `ignore_patterns`. Without this, Vulcan will try to parse it as a model.

***

### Add a model

Use fully-qualified three-part names (`catalog.schema.name`) for the model name and every source table. Use lowercase identifiers throughout. Trino lowercases unquoted names.

```sql
MODEL (
  name <lakehousedepot>.bronze.orders,
  kind FULL,
  cron '@daily',
  grain order_id,
  description 'Order transactions, full refresh from the Postgres source.',
  columns (
    order_id INTEGER,
    customer_id INTEGER,
    order_date TIMESTAMP,
    order_status VARCHAR
  ),
  assertions (
    unique_values(columns := (order_id)),
    not_null(columns := (order_id, customer_id, order_date))
  )
);

SELECT order_id, customer_id, order_date, order_status
FROM <postgresdepot>.public.orders_ext;
```

For Iceberg-backed columns, prefer `TIMESTAMP(6)`. For time-range incrementals, filter with `>= @start_dt AND < @end_dt`.

Optional data-quality checks in `dq/orders.yml`:

```yaml
kind: dq
name: orders_dq
depends_on: <lakehousedepot>.bronze.orders
rules:
  - row_count > 0:
      name: has_rows
      dimension: completeness
  - duplicate_count(order_id) = 0:
      name: unique_order_id
      dimension: uniqueness
```

***

### trino-server-deploy.yaml

The deploy manifest. The fields you will change most often are `depots` (order matters) and the JVM `-Xmx` value in `jvmConfig`.

```yaml
version: v1alpha
type: vulcan-dg-trino
name: my-trino-dp
owner: <owner>
description: "Managed Trino data product."
spec:
  runAsUser: "<owner>"
  compute: <trino-compute-pool>
  engine: trino
  repo:
    url: <https://your-vcs/your-repo>
    syncFlags:
      - "--ref=<branch>"
      - "--submodules=off"
    baseDir: <path/to/my-trino-dp>
    secretId: <tenant>:<git-sync-secret>

  depots:
    - dataos://<lakehousedepot>?purpose=rw   # first depot = default catalog and materialization target
    - dataos://<postgresdepot>?purpose=rw    # source

  trino:
    coordinator:
      trinoServerConfig:
        jvmConfig: |
          -server
          -Xmx4G
          -XX:+UseG1GC
          -XX:G1HeapRegionSize=32M
        logProperties: |
          io.trino=INFO
    workers:
      replicas: 1
      trinoServerConfig:
        jvmConfig: |
          -server
          -Xmx4G
          -XX:+UseG1GC
          -XX:G1HeapRegionSize=32M
        logProperties: |
          io.trino=INFO

  workflow:
    logLevel: INFO
    schedule:
      crons: ["0 */6 * * *"]
      endOn: "<YYYY-01-01T00:00:00-00:00>"
      timezone: "UTC"
      concurrencyPolicy: Forbid
    resource:
      request: { cpu: "1000m", memory: "2Gi" }
      limit:   { cpu: "2000m", memory: "4Gi" }
    plan:
      command: [vulcan]
      arguments: ["--log-to-stdout", "plan", "--auto-apply"]
    run:
      command: [vulcan]
      arguments: ["--log-to-stdout", "run"]

  use:
    projection:
      projections:
        envVars:
          - key: TRINO_CATALOG
            template: "<lakehousedepot>"

  api:
    replicas: 1
    resource:
      request: { cpu: "1000m", memory: "2Gi" }
      limit:   { cpu: "2000m", memory: "4Gi" }
```

Set `-Xmx` to fit the pod memory limit. `4G` is the validated value for a `4Gi` pod limit. This is the most common cause of deployment failure when left unset.

Set `endOn` one to two years out. Schedules that expire stop silently without any error.

The `TRINO_CATALOG` value under `use.projection` must match an actual catalog name, either a depot name or a secret catalog name from your lists. This sets the gateway and default catalog.

***

### Apply and verify

Apply in dependency order:

```
source depots > lakehouse depot > git and tenant state/object-store secrets > resource
```

```bash
ds resource apply -f my-trino-dp/trino-server-deploy.yaml

ds resource -t vulcan-dg-trino -n my-trino-dp get runtime
```

Confirm the cluster is up:

```bash
ds resource -t vulcan -n my-trino-dp-trino         logs -l 100
ds resource -t vulcan -n my-trino-dp-trino-workers logs -l 100
```

Look for `SERVER STARTED` in the coordinator logs.

***

### Common issues

| Issue                                                       | Fix                                                                                                                                             |
| ----------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| Workers OOM or cluster forms with 0 workers                 | Set `-Xmx` to fit the pod memory limit in both coordinator and worker `jvmConfig`. Use `4G` for a `4Gi` limit.                                  |
| Models materialize to the wrong catalog                     | Use three-part names for model names and source reads. The default catalog is the first secret catalog if any exist, otherwise the first depot. |
| Two-part names (`schema.name`) resolving to the wrong place | Always use `catalog.schema.name` for every model, semantic, DQ `depends_on`, and source read.                                                   |
| Deploy YAML parsed as a model                               | Add the deploy YAML filename to `ignore_patterns` in `config.yaml`.                                                                             |
| `Table not found` for a cross-catalog source                | Use fully-qualified `catalog.schema.table`. Declare sources in `external_models.yaml` if the linter cannot resolve them.                        |
| Identifier casing errors                                    | Use lowercase identifiers. Trino lowercases unquoted names. Quote with `"` only when the name collides with a reserved word.                    |
| Schedule stops without warning                              | Set `endOn` one to two years out. Keep `timezone: UTC` and `concurrencyPolicy: Forbid`.                                                         |

***

### Secret catalog (Postgres example)

To use a catalog backed by a DataOS secret instead of a depot, create a secret whose keys are valid Trino catalog property names:

```yaml
name: postgresrr
version: v2alpha
type: secret
layer: user
description: "Trino PostgreSQL catalog properties."
secret:
  type: key-value
  data:
    connector.name:
    connection-url:
    connection-user:
    connection-password:
```

Apply the secret:

```bash
ds apply -f postgresrr.yaml
```

Reference it in the deploy manifest and point the gateway catalog at it:

```yaml
spec:
  trino:
    catalog:
      config:
        - "<tenant>:postgresrr"
  use:
    projection:
      projections:
        envVars:
          - key: TRINO_CATALOG
            template: "postgresrr"
```

A data product can run with only secret catalogs and no depots.

***

### Next steps

After configuring Managed Trino, continue with:

```
Connect to Engine -> Define models -> Validate and test locally
```

For cluster tuning, failure modes, local development, and the full configuration reference, see the Trino engine manual.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/build/stage-2-productize/connect-to-engine/trino/managed-trino.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
