> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/metadata-pipelines/metadata-sources.md).

# Metadata Sources

Nilus metadata sources are upstream systems that Nilus can introspect with a `spec.type: metadata` pipeline. A metadata pipeline does not move table rows into a destination. It publishes source inventory and, where supported, lineage, profiling, classification, and usage context into the DataOS metadata catalog.

Every metadata pipeline sets a required `mode`: `shallow` runs source inventory (`metadata`) and `lineage`; `deep` adds `profiler`, `classification`, and `usage`. Use this page to decide whether a source is metadata-capable. Use [Understanding Metadata Pipeline Config](/concepts/resources/nilus/metadata-pipelines/pipeline-config.md) for the field-by-field YAML contract and [Metadata Sample Configs](/concepts/resources/nilus/metadata-pipelines/sample-configs.md) for ready-to-edit examples.

## Supported Metadata Sources

| Source                                                                                                                    | `service_type` | `shallow` stages      | `deep` stages                                                | Notes                                                                                                                                                                                                            |
| ------------------------------------------------------------------------------------------------------------------------- | -------------- | --------------------- | ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [Snowflake (Metadata)](/concepts/resources/nilus/metadata-pipelines/metadata-sources/snowflake-metadata.md)               | `snowflake`    | `metadata`, `lineage` | `metadata`, `lineage`, `profiler`, `classification`, `usage` | Use a Snowflake depot or a `metadata+snowflake://...` URI. Grant access to object metadata, account-usage views, query history, tags, procedures, and functions when lineage and usage are required.             |
| [Databricks (Metadata)](/concepts/resources/nilus/metadata-pipelines/metadata-sources/databricks-metadata.md)             | `databricks`   | `metadata`, `lineage` | `metadata`, `lineage`, `profiler`, `classification`, `usage` | Connect through a `metadata+databricks://...` URI; there is no DataOS depot variant for Databricks. For Unity Catalog, grant the token principal catalog, schema, and table visibility for the extraction scope. |
| [DataOS Lakehouse (Metadata)](/concepts/resources/nilus/metadata-pipelines/metadata-sources/dataos-lakehouse-metadata.md) | `lakehouse`    | `metadata` only       | `metadata` only                                              | Use a DataOS Lakehouse depot. The Lakehouse metadata path publishes catalog inventory only in both modes; lineage, profiler, classification, and usage are skipped.                                              |

## How To Choose Scope

Metadata extraction should be deliberately bounded in production:

* Use `mode` to choose how much runs: `shallow` for inventory + lineage on a tight schedule, `deep` for the full enrichment less often.
* Use `database_filter` to limit the database, project, or catalog scope.
* Use `schema_filter` to include only the schemas or namespaces that should appear in the catalog.
* Use `table_filter` to exclude temporary, audit, staging, or other low-value tables.
* Use `query_log_duration` and `result_limit` only for warehouse sources where lineage and usage are needed.

The same authored Nilus resource becomes one workflow. For Snowflake and Databricks, Nilus runs the source inventory stage first, then runs the stages the mode selects (lineage in `shallow`; lineage, profiler, classification, and usage in `deep`) after inventory succeeds. For DataOS Lakehouse, Nilus renders only the inventory stage in either mode.

## Before You Configure A Source

* Confirm that the connection has metadata-read permissions, not just data-read permissions.
* Confirm that query-history access is available before expecting lineage or usage output.
* For Snowflake, confirm that the role can read the approved account-usage surface before expecting tags, query history, stored procedure/function context, lineage, or usage output.
* Keep the source identity stable. Changing the depot name or service identity can create a new source entry in the catalog instead of updating the existing one.
* Do not set `source_table` on a metadata resource. Nilus assigns stage-specific values internally.

## Connector Page Structure

Each metadata-capable source page should include requirements, metadata-specific permissions, supported metadata stages, sample Nilus config, behavior and capabilities, troubleshooting, and related docs.

## Related Docs

* [Understanding Metadata Pipelines](/concepts/resources/nilus/metadata-pipelines.md)
* [Understanding Metadata Pipeline Config](/concepts/resources/nilus/metadata-pipelines/pipeline-config.md)
* [Metadata Sample Configs](/concepts/resources/nilus/metadata-pipelines/sample-configs.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/metadata-pipelines/metadata-sources.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
