> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/metadata-pipelines/metadata-sources/dataos-lakehouse-metadata.md).

# DataOS Lakehouse (Metadata)

DataOS Lakehouse (Iceberg) is supported as a **metadata-only** source. A `spec.type: metadata` pipeline brings Lakehouse source inventory into the DataOS metadata catalog so users can find the Lakehouse-backed datasets that exist.

Unlike Snowflake and Databricks, Lakehouse metadata pipelines publish **catalog inventory only**. Lineage, profiler, classification, and usage stages are skipped at template-render time, the rendered workflow contains a single `metadata` node. For the field-by-field authoring contract, see [Understanding Metadata Pipeline Config](/concepts/resources/nilus/metadata-pipelines/pipeline-config.md).

## Metadata stages

| Stage            | Status                                                            |
| ---------------- | ----------------------------------------------------------------- |
| `metadata`       | Runs, catalog, namespace, table, column, and partition inventory. |
| `lineage`        | Not generated for Lakehouse.                                      |
| `profiler`       | Not generated for Lakehouse.                                      |
| `classification` | Not generated for Lakehouse.                                      |
| `usage`          | Not generated for Lakehouse.                                      |

{% hint style="info" %}
`mode` (`shallow` or `deep`) is required on every metadata pipeline, but it does **not** change the rendered DAG for Lakehouse, only the `metadata` stage runs in either mode. Set `mode: shallow`. Because lineage, profiler, classification, and usage are never generated, `query_log_duration` and `result_limit` also have no effect.
{% endhint %}

## What Nilus brings in from Lakehouse

| What Nilus brings in | How Nilus fetches or resolves it from Lakehouse                                                                             |
| -------------------- | --------------------------------------------------------------------------------------------------------------------------- |
| Catalog context      | Reads the Lakehouse catalog name from the connection configuration, or uses the default catalog when one is not provided.   |
| Namespaces           | Uses the Lakehouse metadata interface to list namespaces.                                                                   |
| Tables               | Lists tables inside each namespace and loads each table's metadata before publishing it to the catalog.                     |
| Columns              | Reads table schema fields so dataset detail pages can show the available columns and types.                                 |
| Partition details    | Reads the table partition specification and maps those fields into partition context.                                       |
| Table descriptions   | Reads description-like values from table metadata and properties when the source provides them.                             |
| Owners               | Reads the configured ownership property when present, then resolves it to a known user before publishing ownership context. |

Nilus does not bring Lakehouse tags, stored procedures, query usage, or lineage through this path.

## Source options

Metadata pipelines accept only the customer-facing `source.options` keys below. Do **not** set `source_table`, Nilus assigns the stage value internally.

| Option            | Required | Description                                                                             |
| ----------------- | -------- | --------------------------------------------------------------------------------------- |
| `service_type`    | Yes      | Must be `lakehouse` (alias `iceberg`).                                                  |
| `database_filter` | No       | Restrict by catalog name. Object with `includes` / `excludes` arrays of regex patterns. |
| `schema_filter`   | No       | Restrict by namespace name. Same shape as `database_filter`.                            |
| `table_filter`    | No       | Restrict by table name. Same shape as `database_filter`.                                |

## Required permissions

Use a DataOS Lakehouse depot. The depot secret must include an `apikey` (a DataOS API token). Nilus passes it to the Iceberg REST catalog as the connection token. If the token cannot read a particular table, that table is skipped. Grant the token read access to every catalog and namespace you expect in the inventory. The metadata path also needs read access to the Lakehouse catalog, namespaces, and table metadata in scope.

## Sample Nilus config

```yaml
name: lakehouse-metadata
version: v1alpha
type: nilus
tags: [nilus, metadata]
spec:
  type: metadata
  mode: shallow
  compute: comet-compute
  schedule:
    crons:
      - "0 */6 * * *"
    concurrencyPolicy: Forbid
  source:
    address: dataos://datalakehouse?purpose=rw
    options:
      service_type: lakehouse
      schema_filter:
        includes: ["^analytics_"]
```

This resource renders a single-node `metadata` workflow. `mode` is required (set `shallow`), but it does not change the Lakehouse DAG. See [Metadata Sample Configs](/concepts/resources/nilus/metadata-pipelines/sample-configs.md) for more.

## Behavior and capabilities

* **Pipeline shape**: exactly one DAG node (`metadata`) in both modes; enrichment stages are never generated.
* **Connection**: a DataOS Lakehouse depot (`dataos://<depot>?purpose=rw`).
* **Product context**: if a Lakehouse-backed dataset is part of a productised flow, the product, semantic, and metric context comes from a separate product layer, not from this metadata path. Read the Lakehouse inventory alongside those product surfaces, not in isolation.

## Troubleshooting

| Symptom                                                                                                                      | Likely cause                                                                                                             | Resolution                                                                                                 |
| ---------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------- |
| `service_type 'lakehouse' only supports source_table ['metadata']`                                                           | A `source_table` other than `metadata` was set, or an enrichment stage was forced.                                       | Remove `source_table` from the resource. Lakehouse renders only the `metadata` stage.                      |
| No lineage / usage / profiles appear                                                                                         | Expected, Lakehouse is metadata-only today.                                                                              | Use Snowflake or Databricks if you need the enrichment stages.                                             |
| Namespaces missing from the catalog                                                                                          | The catalog/namespace is outside the configured filters or not visible to the depot.                                     | Check `schema_filter` and the depot's catalog scope.                                                       |
| Run succeeds but logs show `403` / `Authentication failed` / `Not Authorized! Invalid token.` and fewer records are returned | The depot `apikey` lacks read access on some Iceberg tables. Those tables are skipped and the run still reports success. | This is expected partial-inventory behavior. Grant the token read access on the missing tables and re-run. |

## Related docs

* [Metadata Sources](/concepts/resources/nilus/metadata-pipelines/metadata-sources.md): all metadata-capable sources and how to scope extraction.
* [Understanding Metadata Pipelines](/concepts/resources/nilus/metadata-pipelines.md): the conceptual model.
* [Understanding Metadata Pipeline Config](/concepts/resources/nilus/metadata-pipelines/pipeline-config.md): the `spec.type: metadata` contract and DAG anatomy.
* [Metadata Sample Configs](/concepts/resources/nilus/metadata-pipelines/sample-configs.md): ready-to-edit YAML.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/metadata-pipelines/metadata-sources/dataos-lakehouse-metadata.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
