> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/batch/batch-sources/snowflake.md).

# Snowflake

[Snowflake](https://docs.snowflake.com/) is a cloud-native data warehouse. Nilus reads from Snowflake as a **batch source** through the standard Snowflake SQL path. The batch connector supports both password authentication and key-pair authentication, with a virtual warehouse driving query execution.

For information on writing **into** Snowflake, see the [Snowflake](/concepts/resources/nilus/destinations/cloud-warehouses/snowflake.md).

## Requirements

Connectivity and credentials must both be in place before the pipeline can run.

### Connectivity

* The Nilus runtime must reach `<account>.snowflakecomputing.com` on TCP `443`.
* A running virtual warehouse is required for every read; warehouse size and auto-suspend behavior directly affect ingestion cost and latency.
* For key-pair authentication, the public key must already be registered against the Snowflake user (`ALTER USER ... SET RSA_PUBLIC_KEY = '...'`).

### Required parameters

| Parameter                | Required    | Default              | Description                                                                                                                                   |
| ------------------------ | ----------- | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
| `account`                | Yes         | -                    | Snowflake account identifier (e.g. `xy12345.us-west-2.snowflakecomputing.com` or `xy12345`).                                                  |
| `username`               | Yes         | -                    | Snowflake user with access to the warehouse, database, and schema.                                                                            |
| `password`               | Conditional | -                    | Password, required for password authentication. Omit when using key-pair auth.                                                                |
| `private_key`            | Conditional | -                    | Base64-encoded PEM private key, required for key-pair authentication. Passed as a URI query parameter.                                        |
| `private_key_passphrase` | No          | -                    | Passphrase for an encrypted private key, when applicable. The connector decodes the key on the runtime side and re-encodes it for the driver. |
| `database`               | Yes         | -                    | Default database for the connection.                                                                                                          |
| `warehouse`              | Yes         | -                    | Virtual warehouse to use for all reads.                                                                                                       |
| `role`                   | Recommended | account default role | Role for the connection. Strongly recommended to set explicitly so behavior is reproducible across environments.                              |

### Database-side permissions

```sql
GRANT USAGE ON WAREHOUSE <warehouse_name> TO ROLE <role_name>;
GRANT USAGE ON DATABASE <database_name> TO ROLE <role_name>;
GRANT USAGE ON SCHEMA <database_name>.<schema_name> TO ROLE <role_name>;
GRANT SELECT ON ALL TABLES IN SCHEMA <database_name>.<schema_name> TO ROLE <role_name>;
GRANT SELECT ON FUTURE TABLES IN SCHEMA <database_name>.<schema_name> TO ROLE <role_name>;
GRANT ROLE <role_name> TO USER <username>;
```

The `FUTURE TABLES` grant ensures the user keeps `SELECT` on tables created in the schema after the grant.

### URI format

Password authentication:

```
snowflake://<username>:<password>@<account>/<database>?warehouse=<warehouse>&role=<role>
```

Key-pair authentication:

```
snowflake://<username>@<account>/<database>?warehouse=<warehouse>&role=<role>&private_key=<base64-pem>&private_key_passphrase=<passphrase>
```

Depot-backed equivalent (recommended for production):

```
dataos://<snowflake-depot>
```

## Source options

| Option                 | Required | Description                                                                                                                                                                    |
| ---------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `source_table`         | Yes      | Fully qualified table name in `<schema>.<table>` form. Add an explicit database prefix (`<database>.<schema>.<table>`) only when reading across databases.                     |
| `incremental_key`      | No       | Timestamp or numeric column used to identify newly visible rows for each run. Snowflake's `LAST_MODIFIED_AT`-style columns or sequence-backed `id` are common picks.           |
| `interval_start`       | No       | Optional ISO-8601 lower bound for the extraction window.                                                                                                                       |
| `interval_end`         | No       | Optional ISO-8601 upper bound for the extraction window.                                                                                                                       |
| `page_size`            | No       | Rows per extraction batch (default `50000`).                                                                                                                                   |
| `sql_reflection_level` | No       | `full` (default) or reduced, controls how thoroughly Nilus reflects the source schema before extraction.                                                                       |
| `sql_limit`            | No       | Caps total rows extracted per run. Use for sampling and validation.                                                                                                            |
| `sql_exclude_columns`  | No       | Comma-separated column names to skip during extraction.                                                                                                                        |
| `type_hints`           | No       | Object map of `column_name: <type>` to override inferred types. Supported types: `text`, `bigint`, `bool`, `timestamp`, `date`, `decimal`, `double`, `binary`, `json`, `time`. |
| `max_table_nesting`    | No       | String. `"0"` for fully flattened analytics-friendly output (default for SQL sources).                                                                                         |

> **Note** Snowflake folds unquoted identifiers to upper-case. Quote identifiers in DDL and in `source_table` consistently, or stick to upper-case in both.

## Sample Nilus configs

Each example below is self-contained and uses the current Nilus pipeline shape.

### Batch, incremental ingestion to Lakehouse (password auth)

```yaml
name: nilus-snowflake-batch
version: v1alpha
type: nilus
description: Snowflake → DataOS Lakehouse incremental snapshot
spec:
  type: batch
  compute: runnable-default
  source:
    address: dataos://snowflake-depot
    options:
      source_table: ANALYTICS.ORDERS
      incremental_key: LAST_MODIFIED_AT
  sink:
    address: dataos://analytics-lakehouse
    options:
      dest_table: sales.orders_raw
      incremental_strategy: merge
      aws_region: us-west-2
```

### Batch, direct URI with key-pair authentication

```yaml
spec:
  type: batch
  source:
    address: snowflake://nilus_user@xy12345.us-west-2/ANALYTICS?warehouse=COMPUTE_WH&role=ANALYST&private_key={SNOWFLAKE_PRIVATE_KEY_B64}
    options:
      source_table: ANALYTICS.ORDERS
      incremental_key: LAST_MODIFIED_AT
  sink:
    address: dataos://analytics-lakehouse
    options:
      dest_table: sales.orders_raw
      incremental_strategy: merge
```

## Behavior and capabilities

* **Compute model**: Nilus connects through the Snowflake SQLAlchemy dialect and submits parameterized SQL reads against the configured warehouse.
* **Object model**: Snowflake databases, schemas, tables, and views, addressed as `<schema>.<table>` (or `<database>.<schema>.<table>` when crossing databases).
* **Pipeline mode**: this page documents Snowflake batch extraction. Snowflake also supports a metadata pipeline that catalogs the source without copying rows; see [Snowflake (Metadata)](/concepts/resources/nilus/metadata-pipelines/metadata-sources/snowflake-metadata.md).
* **Authentication modes**: password and key-pair. Key-pair is strongly recommended for production: the connector decodes the supplied PEM, re-encodes the DER form, and hands it to the driver in the format Snowflake expects.
* **Identifier case**: Snowflake stores unquoted identifiers in upper-case. Use upper-case forms in `source_table` to match the catalog, or quote consistently throughout your pipeline.
* **Custom queries**: supply `source_table: "query:SELECT ... FROM ..."` to extract from a hand-authored SQL query.
* **Cost control**: Snowflake bills per warehouse-second of execution. Lower-cost approaches: keep `incremental_key` enabled so each run scans only fresh data; let the warehouse auto-suspend after each run; consider a dedicated XS or SMALL warehouse for ingestion.

## Troubleshooting

| Symptom                                                          | Likely cause                                                                                                                    | Resolution                                                                                                                                                |
| ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `390100 Authentication failed` (password auth)                   | Wrong username, wrong password, account locked, or MFA enforced.                                                                | Re-check credentials; for accounts with MFA enforced, switch to key-pair auth, which is exempt from interactive MFA.                                      |
| `JWT token is invalid` (key-pair auth)                           | Private key not registered against the user, key passphrase mismatch, or the key is in the wrong PEM format.                    | Confirm `ALTER USER ... SET RSA_PUBLIC_KEY = '...'` was run with the matching public key; verify the passphrase; ensure the private key is in PKCS#8 PEM. |
| `No active warehouse selected in the current session`            | `warehouse` query parameter omitted, or the role does not have `USAGE` on the warehouse.                                        | Always set `warehouse=<name>` in the URI; grant `USAGE ON WAREHOUSE <name> TO ROLE <role>`.                                                               |
| `Object 'X.Y.Z' does not exist or not authorized`                | Identifier case mismatch (most common: lower-case in manifest, upper-case in catalog), or the role lacks `SELECT` on the table. | Use the upper-case form in `source_table`, or quote consistently; grant `SELECT` to the role.                                                             |
| Run stalls at "warehouse resuming" for several minutes           | The warehouse was suspended and is starting from cold; autoresume is enabled but warehouse start is sequential.                 | Use a dedicated, smaller warehouse for ingestion to keep start time short, or set `INITIALLY_SUSPENDED = FALSE` for the duration of a backfill.           |
| Run extracts the full table every time despite `incremental_key` | The column is not monotonically increasing in the source, or the pipeline state was reset between runs.                         | Pick a column that is genuinely monotonic.                                                                                                                |
| Costs spike during ingestion                                     | Default warehouse is too large, or `auto_suspend` interval too long, leaving the warehouse running between batches.             | Use a small dedicated ingestion warehouse with `AUTO_SUSPEND = 60`.                                                                                       |

## Related docs

* [Snowflake](/concepts/resources/nilus/destinations/cloud-warehouses/snowflake.md): companion destination connector.
* [Snowflake key-pair authentication](/concepts/resources/nilus/troubleshooting/snowflake-key-pair-authentication.md): recommended password-less authentication setup.
* [Optimize Sink Datasets](/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets.md): guidance on `incremental_strategy`, `partition_by`, `cluster_by`, and other shape settings.
* [Snowflake (Metadata)](/concepts/resources/nilus/metadata-pipelines/metadata-sources/snowflake-metadata.md): catalog Snowflake without copying rows via a `spec.type: metadata` pipeline.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/batch/batch-sources/snowflake.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
