> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/metadata-pipelines/sample-configs.md).

# Sample Configs

All examples below use the current Nilus pipeline shape with `spec.type: metadata`:

```yaml
type: nilus
spec:
  type: metadata
```

A metadata pipeline extracts catalog, lineage, profiler, classification, and usage information from a connected source system and loads it into the DataOS metadata catalog. You author **one** Nilus resource per source system, Nilus renders it into a DAG behind the scenes. The required `mode` field decides how much of that DAG runs: `shallow` renders `metadata` + `lineage`, and `deep` adds `profiler`, `classification`, and `usage` in parallel after the root. For Lakehouse sources, only the `metadata` stage runs in either mode. For the conceptual model, see [Understanding Metadata Pipelines](/concepts/resources/nilus/metadata-pipelines.md). For the field-by-field reference and the DAG anatomy, see [Understanding Metadata Pipeline Config](/concepts/resources/nilus/metadata-pipelines/pipeline-config.md).

{% hint style="info" %}
**Supported sources.** Metadata pipelines are documented here for **Snowflake**, **Databricks**, and **DataOS Lakehouse** (Iceberg). Samples for other source systems are intentionally omitted until they are documented.
{% endhint %}

{% hint style="info" %}
`mode` (`shallow` or `deep`) is required on every metadata pipeline. You do **not** set `source_table` on a metadata resource, Nilus hardcodes it per DAG node. The `source.options` block accepts only the seven fields shown in these samples (`service_type`, `database_filter`, `schema_filter`, `table_filter`, `query_log_duration`, `result_limit`, `threads`); anything else is rejected at schema validation.
{% endhint %}

### Snowflake

Before running either Snowflake sample, make sure the Snowflake role has the metadata-specific privileges described in [Snowflake (Metadata) → Required permissions](/concepts/resources/nilus/metadata-pipelines/metadata-sources/snowflake-metadata.md#required-permissions). Inventory can work with object visibility, but tags, query history, stored procedures/functions, lineage, and usage depend on the approved account-usage surface.

<details>

<summary>Depot-backed, deep (recommended)</summary>

`mode: deep` runs the full DAG. Switch to `mode: shallow` to run only `metadata` + `lineage`.

```yaml
name: snowflake-metadata
version: v1alpha
type: nilus
tags: [nilus, metadata]
spec:
  type: metadata
  mode: deep
  compute: comet-compute
  schedule:
    crons:
      - "0 */6 * * *"
    concurrencyPolicy: Forbid
  source:
    address: dataos://snowflake-metadata-depot?purpose=rw
    options:
      service_type: snowflake
      database_filter:
        includes: ["PROD_DB", "ANALYTICS_DB"]
      schema_filter:
        includes: ["^MODEL", "^GOLD_"]
        excludes: ["^TMP_"]
      table_filter:
        excludes: ["^_audit"]
      query_log_duration: 3
      result_limit: 10000
      threads: 4          # parallel workers for profiler/usage; raise to cut runtime
```

This single resource produces a 5-node DAG: `snowflake-metadata-metadata` (root) → `snowflake-metadata-lineage`, `snowflake-metadata-profiler`, `snowflake-metadata-classification`, `snowflake-metadata-usage` (parallel). All five stages run every 6 hours under a single workflow.

</details>

<details>

<summary>Depot-backed, shallow (inventory + lineage only)</summary>

A `shallow` pipeline keeps the source inventory and lineage current without the heavier profiling, classification, and usage scans. It is lighter, so it suits a tighter schedule. Pair it with a less frequent `deep` pipeline when you also need the full enrichment.

```yaml
name: snowflake-metadata-shallow
version: v1alpha
type: nilus
tags: [nilus, metadata]
spec:
  type: metadata
  mode: shallow
  compute: comet-compute
  schedule:
    crons:
      - "0 * * * *"
    concurrencyPolicy: Forbid
  source:
    address: dataos://snowflake-metadata-depot?purpose=rw
    options:
      service_type: snowflake
      database_filter:
        includes: ["PROD_DB", "ANALYTICS_DB"]
      schema_filter:
        includes: ["^MODEL", "^GOLD_"]
        excludes: ["^TMP_"]
      query_log_duration: 1
      result_limit: 10000
```

This resource produces a 2-node DAG: `snowflake-metadata-shallow-metadata` (root) → `snowflake-metadata-shallow-lineage`.

</details>

<details>

<summary>Direct URI (no depot)</summary>

Use the direct `metadata+snowflake://` URI when the Snowflake account is not depot-backed. Project credentials through `spec.use.projection`.

```yaml
name: snowflake-metadata-direct
version: v1alpha
type: nilus
tags: [nilus, metadata]
spec:
  type: metadata
  mode: deep
  compute: comet-compute
  schedule:
    crons:
      - "0 */6 * * *"
    concurrencyPolicy: Forbid
  use:
    projection:
      secrets:
        - id: engineering:snowflake-secret
          contextAlias: snowsecret
      projections:
        envVars:
          - key: SF_USER
            template: "{{ secrets['snowsecret'].user | base64_decode }}"
          - key: SF_PASSWORD
            template: "{{ secrets['snowsecret'].password | base64_decode }}"
  source:
    address: metadata+snowflake://{SF_USER}:{SF_PASSWORD}@xy12345.snowflakecomputing.com/PROD_DB?warehouse=METADATA_WH&role=METADATA_RO
    options:
      service_type: snowflake
      database_filter:
        includes: ["PROD_DB"]
      schema_filter:
        includes: ["^MODEL"]
      query_log_duration: 3
      result_limit: 10000
```

</details>

### Databricks (Unity Catalog)

Databricks metadata connects only through a direct `metadata+databricks://` URI with a projected access token; there is no DataOS depot variant for Databricks. Use `service_type: databricks` for both classic and Unity Catalog deployments. The token goes in the URI password position (`token:<token>@`), and `http_path` is passed as a query parameter.

```yaml
name: databricks-metadata
version: v1alpha
type: nilus
tags: [nilus, metadata]
spec:
  type: metadata
  mode: deep
  compute: comet-compute
  schedule:
    crons:
      - "0 */6 * * *"
    concurrencyPolicy: Forbid
  use:
    projection:
      secrets:
        - id: engineering:databricks-secret
          contextAlias: dbxsecret
      projections:
        envVars:
          - key: DBX_TOKEN
            template: "{{ secrets['dbxsecret'].token | base64_decode }}"
  source:
    address: metadata+databricks://token:{DBX_TOKEN}@adb-12345.6.azuredatabricks.net?http_path=/sql/1.0/warehouses/abc123def456&catalog=main&schema=gold
    options:
      service_type: databricks
      database_filter:
        includes: ["main"]
      schema_filter:
        includes: ["^gold_", "^silver_"]
        excludes: ["^bronze_tmp_"]
      query_log_duration: 3
      result_limit: 10000
```

### DataOS Lakehouse (Iceberg)

<details>

<summary>Metadata only</summary>

```yaml
name: lakehouse-metadata
version: v1alpha
type: nilus
tags: [nilus, metadata]
spec:
  type: metadata
  mode: shallow
  compute: comet-compute
  schedule:
    crons:
      - "0 */6 * * *"
    concurrencyPolicy: Forbid
  source:
    address: dataos://datalakehouse?purpose=rw
    options:
      service_type: lakehouse
      schema_filter:
        includes: ["^analytics_"]
```

{% hint style="info" %}
When `service_type: lakehouse`, the rendered workflow contains **only** the `metadata` DAG node in both modes, lineage, profiler, classification, and usage stages are skipped at template-render time. `mode` is still required by the schema (set `shallow`), but it does not change the rendered DAG. `query_log_duration` and `result_limit` have no effect because the stages that consume them aren't generated.
{% endhint %}

</details>

## Validation notes

* Use `type: nilus` and `spec.type: metadata` for catalog / lineage / profiler / classification / usage extraction.
* `mode` is required and must be `shallow` or `deep`. `shallow` runs `metadata` + `lineage`; `deep` adds `profiler`, `classification`, and `usage`. A metadata resource without `mode` fails schema validation.
* `source.options.service_type` is required. This guide documents `snowflake`, `databricks`, and `lakehouse` (alias `iceberg`).
* Keep the customer-facing `source.options` block to `service_type`, `database_filter`, `schema_filter`, `table_filter`, `query_log_duration`, `result_limit`, and `threads` (parallel worker count: raise it to cut runtime on large `profiler` / `usage` workflows). Don't add `source_table`, `stored_procedure_filter`, `classification_filter`, or other lower-level engine knobs until the Nilus domain template exposes them.
* One Nilus resource yields one workflow. For `snowflake` / `databricks` that workflow is a 2-stage DAG in `shallow` and a 5-stage DAG in `deep`; for `lakehouse` it is a 1-stage DAG in both modes. Don't create separate Nilus resources to get separate stages.
* Always set `database_filter` / `schema_filter` / `table_filter` on production scopes. Unbounded sweeps over a large warehouse can take hours per stage, and the cost compounds across every scheduled run.
* `query_log_duration` and `result_limit` affect `lineage` (both modes) and `usage` (`deep` only). They're ignored by `metadata`, `profiler`, and `classification`.
* Metadata pipelines do not declare a `sink`. The catalog target is configured at the DataOS metadata-service level, Nilus auto-attaches the catalog sink to every DAG node during translation.

## Related docs

* [Understanding Metadata Pipelines](/concepts/resources/nilus/metadata-pipelines.md)
* [Metadata Sources](/concepts/resources/nilus/metadata-pipelines/metadata-sources.md)
* [Understanding Metadata Pipeline Config](/concepts/resources/nilus/metadata-pipelines/pipeline-config.md)
* [Batch Sample Configs](/concepts/resources/nilus/batch/sample-configs.md)
* [CDC Sample Configs](/concepts/resources/nilus/cdc/sample-configs.md)
* [Stream Sample Configs](/concepts/resources/nilus/stream/sample-configs.md)
* [Secrets and Projections](/concepts/resources/nilus/concepts/secrets-and-projections.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/metadata-pipelines/sample-configs.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.