> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/batch/batch-sources/aws-redshift.md).

# AWS Redshift

[Amazon Redshift](https://docs.aws.amazon.com/redshift/) is AWS's columnar data warehouse. Nilus reads from Redshift as a **batch source** through the PostgreSQL-compatible wire protocol, under the hood the SQLAlchemy `redshift+psycopg2://` dialect is used, so anything that works on a `psql` client works for Nilus too.

For information on writing **into** Redshift, see the [Redshift](/concepts/resources/nilus/destinations/cloud-warehouses/redshift.md).

## Requirements

Connectivity and credentials must both be in place before the pipeline can run.

### Connectivity

* The Nilus runtime must reach the Redshift cluster's leader node on TCP `5439` (default). For Redshift Serverless, the endpoint is `<workgroup>.<account-id>.<region>.redshift-serverless.amazonaws.com:5439`. For provisioned clusters, it is `<cluster>.<account-id>.<region>.redshift.amazonaws.com:5439`.
* Inbound network access from the runtime VPC. Either: a) the cluster is in the same VPC and a security-group rule allows the runtime IPs on port 5439, or b) the cluster has a public endpoint and a security-group rule allows the runtime's egress IPs.
* If the cluster requires SSL (recommended; on by default for new clusters), the runtime trusts AWS's standard Redshift CA bundle. No customer action required for the standard CA.

### Required parameters

| Parameter  | Required | Default | Description                                                                                                           |
| ---------- | -------- | ------- | --------------------------------------------------------------------------------------------------------------------- |
| `host`     | Yes      | -       | Cluster endpoint hostname (provisioned or serverless).                                                                |
| `port`     | No       | `5439`  | TCP port for the wire protocol.                                                                                       |
| `username` | Yes      | -       | Redshift user with `SELECT` on the target tables.                                                                     |
| `password` | Yes      | -       | Password for the user.                                                                                                |
| `database` | Yes      | -       | Database name on the cluster (Redshift clusters have one database; serverless workgroups one database per workgroup). |

### Database-side permissions

```sql
CREATE USER nilus_reader WITH PASSWORD '<strong-random>';
GRANT USAGE ON SCHEMA <schema_name> TO nilus_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO nilus_reader;
ALTER DEFAULT PRIVILEGES IN SCHEMA <schema_name>
  GRANT SELECT ON TABLES TO nilus_reader;
```

The `ALTER DEFAULT PRIVILEGES` line ensures Nilus continues to have `SELECT` on tables created in the schema after the grant.

### URI format

```
redshift+psycopg2://<username>:<password>@<host>:5439/<database>?sslmode=require
```

`redshift://` is the canonical scheme but `redshift+psycopg2://` is what Nilus's connector registry dispatches on (both work). `sslmode=require` is strongly recommended for production clusters reachable over the public internet.

Depot-backed equivalent (recommended for production):

```
dataos://<redshift-depot>
```

The depot carries host, port, database, and the credential secret in one place.

## Source options

| Option                 | Required | Description                                                                                                                                                                    |
| ---------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `source_table`         | Yes      | Fully qualified table name in `<schema>.<table>` form. Cross-database reads are not supported on Redshift, each manifest reads from one database.                              |
| `incremental_key`      | No       | Timestamp or numeric column used to identify newly visible rows for each run. `LOAD_TIMESTAMP`-style audit columns and sequence-backed `id` columns are the standard picks.    |
| `interval_start`       | No       | Optional ISO-8601 lower bound for the extraction window.                                                                                                                       |
| `interval_end`         | No       | Optional ISO-8601 upper bound for the extraction window.                                                                                                                       |
| `page_size`            | No       | Rows per extraction batch (default `50000`). For wide rows, lower this to keep per-batch memory bounded.                                                                       |
| `sql_reflection_level` | No       | `full` (default) or reduced, controls how thoroughly Nilus reflects the source schema before extraction.                                                                       |
| `sql_limit`            | No       | Caps total rows extracted per run. Use for sampling and validation.                                                                                                            |
| `sql_exclude_columns`  | No       | Comma-separated column names to skip during extraction.                                                                                                                        |
| `type_hints`           | No       | Object map of `column_name: <type>` to override inferred types. Supported types: `text`, `bigint`, `bool`, `timestamp`, `date`, `decimal`, `double`, `binary`, `json`, `time`. |
| `max_table_nesting`    | No       | String. `"0"` for fully flattened analytics-friendly output (default for SQL sources).                                                                                         |

{% hint style="info" %}
Redshift's `SUPER` columns containing nested JSON arrive at the connector as text. Set `type_hints: { <col>: json }` if you need the downstream pipeline to treat them as JSON rather than strings.
{% endhint %}

## Sample Nilus configs

Each example below is self-contained and uses the current Nilus pipeline shape.

### Batch, incremental snapshot to Lakehouse

```yaml
name: nilus-redshift-batch
version: v1alpha
type: nilus
description: Redshift → DataOS Lakehouse incremental snapshot
spec:
  type: batch
  compute: runnable-default
  source:
    address: dataos://my-redshift-depot
    options:
      source_table: analytics.orders
      incremental_key: load_ts
  sink:
    address: dataos://analytics-lakehouse
    options:
      dest_table: sales.orders_raw
      incremental_strategy: merge
      loader_file_format: parquet
```

### Batch, direct URI with projected secret

```yaml
spec:
  use:
    projection:
      secrets:
        - id: engineering:redshift-secret
          contextAlias: rssecret
      projections:
        envVars:
          - key: RS_USER
            template: "{{ secrets['rssecret'].user | base64_decode }}"
          - key: RS_PASSWORD
            template: "{{ secrets['rssecret'].password | base64_decode }}"
  source:
    address: redshift+psycopg2://{RS_USER}:{RS_PASSWORD}@my-cluster.123456789012.us-east-1.redshift.amazonaws.com:5439/prod?sslmode=require
    options:
      source_table: analytics.orders
      incremental_key: load_ts
  sink:
    address: dataos://analytics-lakehouse
    options:
      dest_table: sales.orders_raw
      incremental_strategy: merge
```

## Behavior and capabilities

* **Compute model**: Nilus connects through the SQLAlchemy `redshift+psycopg2` dialect and submits parameterized SQL reads against the cluster's leader node. Redshift's distributed compute resolves the query plan; the connector consumes the rowset over the wire.
* **Object model**: Redshift databases, schemas, tables, and late-binding views, addressed as `<schema>.<table>`.
* **Pipeline mode**: `batch` only on this page.
* **Authentication modes**: username + password. IAM-based auth (temporary credentials via `GetClusterCredentials`) is not currently wired through the connector; rotate passwords on a regular cadence and store them in a DataOS depot or secret.
* **Custom queries**: supply `source_table: "query:SELECT ... FROM ..."` to extract from a hand-authored SQL query. Useful for joining across schemas or pre-aggregating before extraction.
* **Concurrency**: Redshift charges per query-second of cluster work. Heavy ingestion windows can compete with downstream BI workloads; consider running on Redshift Serverless or a dedicated WLM queue for ingestion.

## Troubleshooting

| Symptom                                                          | Likely cause                                                                                            | Resolution                                                                                                                           |
| ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| `could not connect to server: Connection timed out`              | Network path is blocked, runtime can't reach the leader node.                                           | Verify security-group rule on port 5439, VPC peering / Transit Gateway routes, or move the cluster into the same VPC as the runtime. |
| `FATAL: password authentication failed for user`                 | Wrong username or password, or the user's password was rotated.                                         | Re-check credentials; rotate the depot secret.                                                                                       |
| `permission denied for table <schema>.<table>`                   | The Nilus user has `USAGE` on the schema but not `SELECT` on the table.                                 | Run `GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO nilus_reader;` and the `ALTER DEFAULT PRIVILEGES` line above.             |
| `relation "<schema>.<table>" does not exist`                     | The table was dropped, renamed, or lives in a different schema than the manifest.                       | Check the `<schema>.<table>` value; Redshift identifiers are case-folded to lower-case unless quoted.                                |
| Run extracts the full table every time despite `incremental_key` | The column is not monotonically increasing in the source, or the pipeline state was reset between runs. | Pick a column that is genuinely monotonic. Sequence-backed `id` or `load_ts` columns are typical.                                    |
| `out of memory` errors during extraction                         | Wide rows (many `VARCHAR(MAX)` / `SUPER` columns) combined with a large `page_size`.                    | Lower `page_size`; restrict the column list with `sql_exclude_columns` or a `query:` source table.                                   |
| Extraction is slow despite a small row count                     | Cluster is busy with concurrent BI workloads; ingestion query is queued behind WLM.                     | Move ingestion to a dedicated WLM queue with a guaranteed slot count, or use Redshift Serverless with separate workgroup.            |

## Related docs

* [Redshift](/concepts/resources/nilus/destinations/cloud-warehouses/redshift.md): companion destination connector.
* [Optimize Sink Datasets](/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets.md): guidance on `incremental_strategy`, `partition_by`, `cluster_by`, and other shape settings.
* [Secrets and Projections](/concepts/resources/nilus/concepts/secrets-and-projections.md): credential projection model for direct URIs.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/batch/batch-sources/aws-redshift.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
