> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/metadata-pipelines/metadata-sources/snowflake-metadata.md).

# Snowflake (Metadata)

[Snowflake](https://docs.snowflake.com/) is the most complete metadata source for Nilus. A `spec.type: metadata` pipeline introspects Snowflake and publishes source context, databases, schemas, tables, columns, profiles, lineage, usage, and classification, into the DataOS metadata catalog **without copying any table rows**.

For batch row movement out of Snowflake, see the [Snowflake](/concepts/resources/nilus/batch/batch-sources/snowflake.md). For the field-by-field authoring contract, see [Understanding Metadata Pipeline Config](/concepts/resources/nilus/metadata-pipelines/pipeline-config.md).

## Metadata stages

`service_type: snowflake` supports the full DAG. The required `mode` field decides how much of it runs: `shallow` runs `metadata` + `lineage`; `deep` adds `profiler`, `classification`, and `usage`. Source inventory (`metadata`) runs first; once it succeeds, the remaining stages run in parallel.

| Stage            | Runs in           | What it lands in the catalog                                                                    |
| ---------------- | ----------------- | ----------------------------------------------------------------------------------------------- |
| `metadata`       | `shallow`, `deep` | Databases, schemas, tables, views, dynamic tables, columns, and descriptions.                   |
| `lineage`        | `shallow`, `deep` | Asset and column lineage parsed from view definitions and query history.                        |
| `profiler`       | `deep`            | Per-column statistics (row counts, null counts, distinct counts, min/max, basic distributions). |
| `classification` | `deep`            | Auto-classification tags applied to columns (PII heuristics).                                   |
| `usage`          | `deep`            | Query history and popularity from the account-usage query log.                                  |

The first successful run establishes the source inventory and lineage. In `deep`, profiles, classification, and usage then deepen it. Usage typically finishes later than the other stages because Nilus reads and processes query activity over the configured lookback window. A frequent `shallow` pipeline keeps inventory and lineage fresh; pair it with a less frequent `deep` pipeline for the full profile.

## How Nilus resolves each kind of context

| What Nilus brings in            | How Nilus fetches or resolves it from Snowflake                                                                                                                  |
| ------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Databases                       | Discovery queries such as `SHOW DATABASES`, plus database comments and account context, build the source hierarchy.                                              |
| Schemas                         | Schema inspection and `information_schema.schemata`; schema tags from `ACCOUNT_USAGE.TAG_REFERENCES` when available.                                             |
| Tables and views                | `information_schema.tables` and account-usage table records, so datasets, views, and lifecycle changes land in the catalog.                                      |
| Columns                         | `information_schema.columns` and constraint metadata for names, types, nullability, defaults, comments, and key signals.                                         |
| Descriptions                    | Comments from database, schema, table, and column metadata map into catalog descriptions.                                                                        |
| Tags                            | `ACCOUNT_USAGE.TAG_REFERENCES`; non-empty tag values attach to schemas, tables, and columns.                                                                     |
| Stored procedures and functions | Account-usage procedure and function views, with `DESCRIBE PROCEDURE` / `DESCRIBE FUNCTION` as a fallback.                                                       |
| Query usage                     | `ACCOUNT_USAGE.QUERY_HISTORY` within the configured `query_log_duration` and `result_limit`.                                                                     |
| Lineage                         | Parses view definitions, eligible query-history statements, procedure child queries, and external-table locations, then resolves them against ingested metadata. |

## Source options

Metadata pipelines accept only the customer-facing `source.options` keys below. Do **not** set `source_table`, Nilus assigns a stage-specific value to each DAG node internally.

| Option               | Required | Used by stages     | Description                                                                                                                              |
| -------------------- | -------- | ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `service_type`       | Yes      | all                | Must be `snowflake`.                                                                                                                     |
| `database_filter`    | No       | all                | Restrict by database name. Object with `includes` / `excludes` arrays of regex patterns.                                                 |
| `schema_filter`      | No       | all                | Restrict by schema name. Same shape as `database_filter`.                                                                                |
| `table_filter`       | No       | all                | Restrict by table / view name. Same shape as `database_filter`.                                                                          |
| `query_log_duration` | No       | `lineage`, `usage` | Days of query history to ingest per run. Defaults to `1`. Used by `lineage` (both modes) and `usage` (`deep` only).                      |
| `result_limit`       | No       | `lineage`, `usage` | Maximum number of query-history rows to fetch per run. Defaults to `10000000`. Used by `lineage` (both modes) and `usage` (`deep` only). |

`mode` (`shallow` or `deep`) is a required `spec` field, not a `source.options` key. See [Understanding Metadata Pipeline Config → Choosing a mode](/concepts/resources/nilus/metadata-pipelines/pipeline-config.md#choosing-a-mode).

## Required permissions

A metadata pipeline needs broader read access than a batch read. Source inventory works with object visibility, but tags, query history, stored procedures and functions, lineage, and usage depend on Snowflake's account-usage surface.

The metadata role should cover:

* `USAGE` on the warehouse, database, and schemas in scope;
* `SELECT` on the tables, external tables, views, and dynamic tables in scope;
* access to the account-usage surfaces Nilus reads behind the scenes:
  * `snowflake.account_usage.tables` for incremental extraction;
  * `snowflake.account_usage.tag_references` for tag ingestion;
  * `snowflake.account_usage.query_history` for lineage and usage;
  * `snowflake.account_usage.procedures` and `snowflake.account_usage.functions` for stored procedures and functions.

You can satisfy the account-usage requirement with `ACCOUNTADMIN`, a role with `IMPORTED PRIVILEGES` on the `SNOWFLAKE` database, or a replicated custom account-usage schema if your platform team exposes one instead of direct `SNOWFLAKE.ACCOUNT_USAGE`.

Example grant pattern for a dedicated metadata role:

```sql
-- Dedicated role and user for the metadata workflow.
CREATE ROLE UNDERSTAND_METADATA_ROLE;
CREATE USER UNDERSTAND_METADATA_USER
  DEFAULT_ROLE = UNDERSTAND_METADATA_ROLE
  PASSWORD = '<password>';
GRANT ROLE UNDERSTAND_METADATA_ROLE TO USER UNDERSTAND_METADATA_USER;

-- Compute and source objects in scope.
GRANT USAGE ON WAREHOUSE <warehouse_name> TO ROLE UNDERSTAND_METADATA_ROLE;
GRANT USAGE ON DATABASE <database_name> TO ROLE UNDERSTAND_METADATA_ROLE;
GRANT USAGE ON SCHEMA <database_name>.<schema_name> TO ROLE UNDERSTAND_METADATA_ROLE;

-- Assets you want the catalog to inventory.
GRANT SELECT ON ALL TABLES IN SCHEMA <database_name>.<schema_name> TO ROLE UNDERSTAND_METADATA_ROLE;
GRANT SELECT ON ALL EXTERNAL TABLES IN SCHEMA <database_name>.<schema_name> TO ROLE UNDERSTAND_METADATA_ROLE;
GRANT SELECT ON ALL VIEWS IN SCHEMA <database_name>.<schema_name> TO ROLE UNDERSTAND_METADATA_ROLE;
GRANT SELECT ON ALL DYNAMIC TABLES IN SCHEMA <database_name>.<schema_name> TO ROLE UNDERSTAND_METADATA_ROLE;
```

If your team uses direct `SNOWFLAKE.ACCOUNT_USAGE` access, also grant:

```sql
GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE UNDERSTAND_METADATA_ROLE;
```

Key-pair authentication does not change the role requirement, the authentication method changes, but the role behind the user still needs the privileges above.

If lineage for a view is incomplete and that view depends on Snowflake tags or policies, the role may also need policy privileges:

```sql
-- Add only when policy-tagged views require them for complete lineage.
GRANT APPLY MASKING POLICY TO ROLE UNDERSTAND_METADATA_ROLE;
GRANT APPLY ROW ACCESS POLICY TO ROLE UNDERSTAND_METADATA_ROLE;
GRANT APPLY AGGREGATION POLICY TO ROLE UNDERSTAND_METADATA_ROLE;
GRANT APPLY PROJECTION POLICY TO ROLE UNDERSTAND_METADATA_ROLE;
GRANT APPLY TAG TO ROLE UNDERSTAND_METADATA_ROLE;
```

## Sample Nilus config

Depot-backed (recommended for production):

```yaml
name: snowflake-metadata
version: v1alpha
type: nilus
tags: [nilus, metadata]
description: Catalog Snowflake metadata, schema, lineage, and query usage
spec:
  type: metadata
  mode: deep
  compute: comet-compute
  schedule:
    crons:
      - "0 */6 * * *"
    concurrencyPolicy: Forbid
  source:
    address: dataos://snowflake-metadata-depot?purpose=rw
    options:
      service_type: snowflake
      database_filter:
        includes: ["PROD_DB", "ANALYTICS_DB"]
      schema_filter:
        includes: ["^MODEL", "^GOLD_"]
        excludes: ["^TMP_"]
      table_filter:
        excludes: ["^_audit"]
      query_log_duration: 3
      result_limit: 10000
```

With `mode: deep`, this single resource produces a five-node DAG: `snowflake-metadata-metadata` (root) → `snowflake-metadata-lineage`, `snowflake-metadata-profiler`, `snowflake-metadata-classification`, `snowflake-metadata-usage` (parallel). Switch to `mode: shallow` for a 2-node `metadata` + `lineage` DAG. For a direct (no-depot) URI variant and a shallow example, see [Metadata Sample Configs](/concepts/resources/nilus/metadata-pipelines/sample-configs.md).

## Behavior and capabilities

* **Identifier case**: Snowflake stores unquoted identifiers in upper-case. Use upper-case forms in filter patterns to match how Snowflake reports object names.
* **Connection**: use a Snowflake depot (`dataos://<depot>?purpose=rw`) or a direct `metadata+snowflake://...` URI with projected credentials. A depot bundles the account, database, and warehouse (in the depot spec) with the username, optional role, and credentials (in a referenced secret) in one place, so the workflow references it instead of embedding connection details. Keep the depot name stable: the catalog tracks the source identity that arrives through it, and renaming it can fork the source entry.
* **Scope discipline**: an unfiltered metadata workflow against a large warehouse can take hours per stage, and that cost recurs on every scheduled run. Always set `database_filter` / `schema_filter` / `table_filter` in production.

## Troubleshooting

| Symptom                                                | Likely cause                                                        | Resolution                                                                                                                               |
| ------------------------------------------------------ | ------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| Inventory lands but tags, lineage, and usage are empty | The role lacks account-usage access.                                | Grant `IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE` (or expose a replicated account-usage schema).                                         |
| Lineage incomplete for policy-tagged views             | The role lacks the relevant policy/tag privileges.                  | Add the `APPLY ... POLICY` / `APPLY TAG` grants shown above.                                                                             |
| A stage runs for hours                                 | The extraction scope is unbounded.                                  | Tighten `database_filter` / `schema_filter` / `table_filter`.                                                                            |
| Usage finishes long after inventory                    | Usage reads and processes query history across the lookback window. | Expected, inventory is visible in the catalog before usage completes. Lower `query_log_duration` / `result_limit` to shorten the window. |

## Related docs

* [Snowflake](/concepts/resources/nilus/batch/batch-sources/snowflake.md): batch row movement out of Snowflake.
* [Metadata Sources](/concepts/resources/nilus/metadata-pipelines/metadata-sources.md): all metadata-capable sources and how to scope extraction.
* [Understanding Metadata Pipelines](/concepts/resources/nilus/metadata-pipelines.md): the conceptual model.
* [Understanding Metadata Pipeline Config](/concepts/resources/nilus/metadata-pipelines/pipeline-config.md): the `spec.type: metadata` contract and DAG anatomy.
* [Metadata Sample Configs](/concepts/resources/nilus/metadata-pipelines/sample-configs.md): ready-to-edit YAML.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/metadata-pipelines/metadata-sources/snowflake-metadata.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
