> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/batch/batch-sources/postgresql.md).

# PostgreSQL

[PostgreSQL](https://www.postgresql.org/docs/) is a mature, feature-rich open-source relational database. Nilus reads from PostgreSQL as a **batch source** through the standard PostgreSQL SQL path, with optional incremental ingestion driven by a monotonic key column.

For row-level change capture from PostgreSQL, use the [PostgreSQL (CDC)](/concepts/resources/nilus/cdc/cdc-sources/postgresql.md) instead.

## Requirements

Connectivity and credentials must both be in place before the pipeline can run.

### Connectivity

* The Nilus runtime must reach the PostgreSQL host on the configured port (default `5432`).
* The configured user must be able to connect to the target database and `SELECT` from the target tables.
* For network-isolated PostgreSQL (RDS, Cloud SQL, private endpoints), allowlist the runtime egress IP on the database side.

### Required parameters

These can be supplied via a Depot or via Instance Secrets.

| Parameter  | Required | Default        | Description                                                                             |
| ---------- | -------- | -------------- | --------------------------------------------------------------------------------------- |
| `host`     | Yes      | -              | PostgreSQL host name.                                                                   |
| `port`     | No       | `5432`         | PostgreSQL port.                                                                        |
| `database` | Yes      | -              | Source database name.                                                                   |
| `username` | Yes      | -              | PostgreSQL user with `SELECT` access on the target tables.                              |
| `password` | Yes      | -              | Password for the PostgreSQL user.                                                       |
| `sslmode`  | No       | server default | Optional SSL mode: `disable`, `allow`, `prefer`, `require`, `verify-ca`, `verify-full`. |

### Database-side permissions

Grant at minimum:

```sql
GRANT CONNECT ON DATABASE <database_name> TO <username>;
GRANT USAGE ON SCHEMA <schema_name> TO <username>;
GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO <username>;
ALTER DEFAULT PRIVILEGES IN SCHEMA <schema_name>
  GRANT SELECT ON TABLES TO <username>;
```

The last `ALTER DEFAULT PRIVILEGES` ensures the user keeps `SELECT` on tables created in `<schema_name>` after the grant.

### URI format

```
postgresql://<username>:<password>@<host>:<port>/<database>?sslmode=<sslmode>
```

Depot-backed equivalent (recommended for production):

```
dataos://<postgres-depot>
```

## Source options

| Option                 | Required | Description                                                                                                                                                                    |
| ---------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `source_table`         | Yes      | Fully qualified table name in `<schema>.<table>` form (e.g. `public.orders`).                                                                                                  |
| `incremental_key`      | No       | Timestamp or numeric column used to identify newly visible rows for each run. Use a monotonic column (`updated_at`, sequence-backed `id`).                                     |
| `interval_start`       | No       | Optional ISO-8601 lower bound for the extraction window. Useful for one-off backfills.                                                                                         |
| `interval_end`         | No       | Optional ISO-8601 upper bound for the extraction window.                                                                                                                       |
| `page_size`            | No       | Rows per extraction batch (default `50000`). Lower it for wide tables that strain runtime memory.                                                                              |
| `sql_reflection_level` | No       | `full` (default) or reduced, controls how thoroughly Nilus reflects the source schema before extraction. Reduce on very wide tables for faster startup.                        |
| `sql_limit`            | No       | Caps total rows extracted per run. Use for sampling and validation, not for production.                                                                                        |
| `sql_exclude_columns`  | No       | Comma-separated column names to skip during extraction. Useful for dropping large blob / debug columns.                                                                        |
| `type_hints`           | No       | Object map of `column_name: <type>` to override inferred types. Supported types: `text`, `bigint`, `bool`, `timestamp`, `date`, `decimal`, `double`, `binary`, `json`, `time`. |
| `max_table_nesting`    | No       | String. `"0"` for fully flattened analytics-friendly output (default for SQL sources); raise to `"1"` or `"2"` only when downstream consumers expect nested structures.        |

> **Note** There is **no** separate `schema` option. The schema is inferred from the `<schema>.<table>` form of `source_table`.

## Sample Nilus configs

Each example below is self-contained and uses the current Nilus pipeline shape.

### Batch, incremental ingestion to Lakehouse

```yaml
name: nilus-postgres-batch
version: v1alpha
type: nilus
description: Postgres → DataOS Lakehouse incremental snapshot
spec:
  type: batch
  compute: runnable-default
  source:
    address: dataos://postgres-depot
    options:
      source_table: public.orders
      incremental_key: updated_at
  sink:
    address: dataos://analytics-lakehouse
    options:
      dest_table: sales.orders_raw
      incremental_strategy: merge
      aws_region: us-west-2
```

### Bounded backfill window

```yaml
spec:
  type: batch
  source:
    address: dataos://postgres-depot
    options:
      source_table: public.orders
      incremental_key: updated_at
      interval_start: "2024-01-01T00:00:00Z"
      interval_end: "2024-03-01T00:00:00Z"
  sink:
    address: dataos://analytics-lakehouse
    options:
      dest_table: sales.orders_q1_2024
      incremental_strategy: replace
```

## Behavior and capabilities

* **Compute model**: Nilus drives the source through the PostgreSQL SQL endpoint as a regular client. There is no replication-slot or logical-decoding interaction in batch mode.
* **Object model**: standard PostgreSQL tables (and views) addressed as `<schema>.<table>`.
* **Pipeline mode**: `batch` only on this page; for row-level change capture see the [PostgreSQL (CDC)](/concepts/resources/nilus/cdc/cdc-sources/postgresql.md).
* **Identifier case**: PostgreSQL folds unquoted identifiers to lower-case. Either keep `source_table` lower-case, or quote identifiers consistently across upstream DDL and the manifest.
* **Custom queries**: supply `source_table: "query:SELECT ... FROM ..."` to extract from a hand-authored SQL query instead of a single table. Combined with `incremental_key`, the query body should reference the column so the engine can apply the incremental filter.
* **Performance tuning**: see [Optimize Sink Datasets](/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets.md) for the full list of source / sink shape knobs (`page_size`, `loader_file_size`, `partition_by`, `cluster_by`).

## Troubleshooting

| Symptom                                                                 | Likely cause                                                                                                                                         | Resolution                                                                                                                                                                              |
| ----------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `password authentication failed for user …`                             | Wrong credentials, user dropped, or the password rotated upstream.                                                                                   | Re-check the credentials and confirm the user still exists with the same password in PostgreSQL.                                                                                        |
| `connection to server at "<host>", port <port> failed: timeout expired` | Network path blocked (security group, firewall, private subnet).                                                                                     | Verify the runtime egress IP is allowlisted on the database side and the host/port are reachable from the runtime.                                                                      |
| `permission denied for table <name>`                                    | The user has `CONNECT` on the database but not `SELECT` on the target schema.                                                                        | Run the `GRANT SELECT ON ALL TABLES IN SCHEMA …` block in the Permissions section above and re-run.                                                                                     |
| `SSL connection has been closed unexpectedly` / `SSL SYSCALL error`     | `sslmode` does not match what the server requires; common with managed PG that enforces `require` or higher.                                         | Set `sslmode=require` (or `verify-full` with a CA bundle) in the URI.                                                                                                                   |
| Run extracts the full table every time despite `incremental_key`        | The column is not monotonically increasing in the source (rows mutated without updating `updated_at`), or the pipeline state was reset between runs. | Pick a column that is genuinely monotonic, or rely on a sequence-backed `id`. If the state was lost, the next run will reseed correctly from the latest watermark.                      |
| Wide tables OOMKill the runtime                                         | `page_size` (default `50000`) holds too much data in memory for very wide rows.                                                                      | Lower `page_size` to `25000` or `10000`; see [Optimize Sink Datasets](/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets.md) for the full memory-bounding playbook. |

## Related docs

* [PostgreSQL (CDC)](/concepts/resources/nilus/cdc/cdc-sources/postgresql.md): companion CDC connector for row-level change capture.
* [PostgreSQL](/concepts/resources/nilus/destinations/databases/postgresql.md): companion destination connector.
* [Working with PostgreSQL partitioned tables for CDC](/concepts/resources/nilus/troubleshooting/postgresql-cdc-partitioned-tables.md): runbook for partitioned-table sources.
* [Optimize Sink Datasets](/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets.md): guidance on `incremental_strategy`, `partition_by`, `cluster_by`, and other shape settings.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/batch/batch-sources/postgresql.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
