> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/cdc/cdc-sources/postgresql.md).

# PostgreSQL

PostgreSQL is supported as a CDC source for capturing inserts, updates, and deletes at row granularity. Nilus's CDC engine reads the Write-Ahead Log (WAL) through a logical replication slot and streams decoded row changes. Nilus persists the last acknowledged Log Sequence Number (LSN) in its REST-backed offset store so a restart resumes from exactly where the previous run left off.

Nilus connects to PostgreSQL through a DataOS depot resolved with pipeline `type: cdc`. The depot keeps host, port, credentials, and TLS material outside the pipeline manifest.

## Connectivity

| Form         | Address example              | Notes                                                                             |
| ------------ | ---------------------------- | --------------------------------------------------------------------------------- |
| DataOS depot | `dataos://my-postgres-depot` | Host, port, credentials, and TLS material are resolved from the depot definition. |

Defaults applied by Nilus:

* `plugin.name = pgoutput` (the recommended decoding plugin for PostgreSQL ≥ 10)
* `topic.prefix = cdc` (override per pipeline; the prefix is also used as the sink table prefix)
* Each change event is emitted with the new row state and timestamp metadata; deletes preserve the original primary key.

## Requirements

> *These requirements must be satisfied at the database before Nilus can run a PostgreSQL CDC pipeline.*

### Logical replication enabled

`wal_level` must be `logical` and the cluster must have enough sender / slot capacity:

```sql
ALTER SYSTEM SET wal_level = logical;
ALTER SYSTEM SET max_wal_senders = 10;
ALTER SYSTEM SET max_replication_slots = 10;
-- Restart PostgreSQL after the first wal_level change.
```

Cloud-managed PostgreSQL flavors expose the same setting through their parameter groups:

<details>

<summary>Azure Database for PostgreSQL</summary>

Set `wal_level = LOGICAL` under **Server parameters** for the flexible / single server. The change requires a server restart.

</details>

<details>

<summary>Amazon RDS / Aurora for PostgreSQL</summary>

In the parameter group set:

* `rds.logical_replication = 1`
* Optionally tune `wal_level`, `max_wal_senders`, `max_replication_slots`, `max_connections`.

After applying, reboot the instance for the parameter group to take effect.

</details>

### Connection user privileges

The Nilus connection user needs:

* `LOGIN` and `REPLICATION` attributes.
* `SELECT` on every captured table.
* `CREATE` on the database if Nilus is allowed to create the publication automatically.

```sql
-- Inspect current settings
SELECT rolcanlogin AS can_login, rolreplication AS can_replicate
FROM pg_roles
WHERE rolname = '<username>';

-- Grant replication to an existing role
ALTER ROLE <username> WITH LOGIN REPLICATION;

-- Or create a fresh role
CREATE ROLE <username> WITH LOGIN REPLICATION PASSWORD '<password>';
GRANT SELECT ON ALL TABLES IN SCHEMA <schema> TO <username>;
GRANT CREATE ON DATABASE <database> TO <username>;
```

<details>

<summary>Amazon RDS / Aurora extra grants</summary>

The Nilus user typically needs:

* The `rds_replication` role to manage logical slots.
* Temporary `rds_superuser` (or membership in a high-privilege group) only while enabling logical replication for the first time. Drop it once the cluster is up.

</details>

### Table-level expectations

* Captured tables must have a `PRIMARY KEY`. Without one, updates and deletes cannot be reliably correlated.
* For tables without a primary key but where you still need full delete semantics, set `REPLICA IDENTITY FULL` on the table.
* `UNLOGGED` tables are not supported. Logical decoding never sees their writes.
* Schema changes (`ALTER TABLE`, etc.) are detected by the connector and reflected in the sink schema on the next event.

### Replication slot lifecycle

* Each Nilus pipeline must use a unique `slot.name`.
* The replication slot is created on first start and is *not* dropped automatically on pipeline deletion. WAL files referenced by an inactive slot accumulate on disk until the slot is removed manually.
* For `pgoutput`, Nilus also expects a publication. Either pre-create one or set `publication.autocreate.mode = filtered` so the engine creates the right publication for the include list.

## Core concepts

1. **Logical replication primitive**
   * PostgreSQL's logical replication exposes row-level changes through replication slots and a decoding plugin.
   * Nilus uses `pgoutput` (the in-tree plugin since PostgreSQL 10), which is the recommended choice for new deployments.
   * Only changes to tables with primary keys (or `REPLICA IDENTITY FULL`) are emitted in their full shape.
2. **WAL, LSN, and replay position**
   * The WAL is PostgreSQL's transaction log. Every committed change is identified by an LSN.
   * Nilus tracks the last acknowledged LSN per replication slot so PostgreSQL knows when WAL files are safe to discard.
   * LSN feedback flows continuously while events are processed and through heartbeats during idle periods.
3. **Snapshot phase**
   * On first start (or any time the connector decides a snapshot is needed) Nilus connects through JDBC and reads the captured tables consistently.
   * The snapshot emits one event per row with `op = r` and establishes the baseline schema in the sink.
   * After the snapshot completes, the connector switches to streaming mode.
4. **Streaming phase**
   * The connector reads decoded WAL records through the replication slot.
   * Decoded messages contain only row data; the connector merges them with the cached schema captured during snapshot or refreshed on `ALTER TABLE`.
   * Heartbeat messages keep the slot's `restart_lsn` advancing on idle databases.
5. **Replication slot retention**
   * The slot prevents PostgreSQL from discarding WAL needed by the last acknowledged LSN.
   * When Nilus is paused or stopped, the slot's `restart_lsn` does not advance and `pg_wal` grows.
   * Monitor slot lag with `pg_replication_slots` and `pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)`.
6. **Partitioned tables**
   * Partitioned tables require a small amount of extra setup so individual partitions are captured. See [Working with partitioned tables](/concepts/resources/nilus/troubleshooting/postgresql-cdc-partitioned-tables.md).

### Change event shape

```json
{
  "before": {"id": 1, "name": "old"},
  "after":  {"id": 1, "name": "new"},
  "source": {
    "version":   "3.4.2.Final",
    "connector": "postgresql",
    "name":      "cdc",
    "ts_ms":     1700000000123,
    "snapshot":  "false",
    "db":        "appdb",
    "schema":    "public",
    "table":     "customers",
    "lsn":       22817752,
    "txId":      567,
    "sequence":  "[null, \"22817752\"]",
    "xmin":      null
  },
  "op":    "u",
  "ts_ms": 1700000000234
}
```

PostgreSQL-specific fields surfaced through the source block:

* `lsn`, Log Sequence Number of the WAL entry.
* `txId`, PostgreSQL transaction ID; every change in the same transaction shares it.
* `sequence`, Per-transaction ordering helper.
* `xmin`, Optional, present only when the connector can expose snapshot visibility.

`op` values: `r` (read during snapshot), `c` (insert), `u` (update), `d` (delete).

## Sample Nilus config

The following example uses a DataOS depot for connectivity, captures a single schema, and writes change events to a Lakehouse sink in `flatten` mode.

<details>

<summary>Nilus YAML, PostgreSQL CDC → Lakehouse</summary>

```yaml
name: ncdc-postgres-to-lakehouse
version: v1alpha
type: nilus
tags:
  - nilus-cdc
description: Nilus CDC pipeline for PostgreSQL → DataOS Lakehouse
spec:
  type: cdc
  compute: query-default
  logLevel: INFO
  source:
    address: dataos://prod-postgres-depot
    options:
      strategy: flatten
    cdc:
      table.include.list: "public.customers,public.orders"
      slot.name: "nilus_cdc_retail"
      publication.name: "nilus_cdc_retail_pub"
      publication.autocreate.mode: "filtered"
      topic.prefix: "retail_cdc"
      heartbeat.interval.ms: 60000
      max.batch.size: 2048
      max.queue.size: 8192
  sink:
    address: dataos://analytics-lakehouse
    options:
      dest_table: retail_cdc_customers
      incremental_strategy: append
```

</details>

The depot referenced as `dataos://my-postgres-depot` resolves connection details (host, port, credentials, TLS) from its DataOS definition; no inline secret projection is needed in the manifest above.

## CDC options

These are the most useful CDC connector properties for PostgreSQL. They go under `source.cdc` in the Nilus manifest.

| Option                        | Default                      | Description                                                                                                                                                                                                                                                |
| ----------------------------- | ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `plugin.name`                 | `pgoutput` *(Nilus default)* | Logical decoding plugin. Use `pgoutput` on PostgreSQL ≥ 10 unless you have a specific reason to switch to `decoderbufs`.                                                                                                                                   |
| `slot.name`                   | *No default*                 | Name of the logical replication slot. Must be unique per pipeline and follow PostgreSQL's slot naming rules (lowercase, underscores).                                                                                                                      |
| `publication.name`            | `dbz_publication`            | Name of the publication used by `pgoutput`. Set explicitly so multiple Nilus pipelines do not collide.                                                                                                                                                     |
| `publication.autocreate.mode` | `all_tables`                 | When `filtered`, the engine creates a publication that exactly matches the include list. Recommended when the connection user has `CREATE` privilege.                                                                                                      |
| `topic.prefix`                | `cdc` *(Nilus default)*      | Logical namespace for this pipeline. Must be unique across CDC pipelines. Nilus also uses it as the sink table prefix.                                                                                                                                     |
| `schema.include.list`         | *No default*                 | Comma-separated regex / literals matching schemas to include. Mutually exclusive with `schema.exclude.list`.                                                                                                                                               |
| `schema.exclude.list`         | *No default*                 | Inverse of `schema.include.list`.                                                                                                                                                                                                                          |
| `table.include.list`          | *No default*                 | Comma-separated `schema.table` patterns to include. Mutually exclusive with `table.exclude.list`.                                                                                                                                                          |
| `table.exclude.list`          | *No default*                 | Inverse of `table.include.list`.                                                                                                                                                                                                                           |
| `column.include.list`         | *No default*                 | `schema.table.column` patterns to include in change event values.                                                                                                                                                                                          |
| `column.exclude.list`         | *No default*                 | Inverse of `column.include.list`.                                                                                                                                                                                                                          |
| `snapshot.mode`               | `initial`                    | Snapshot behavior on first start. `initial` snapshots when no offset is recorded; `always` re-snapshots every start; `never` skips snapshots; `initial_only` snapshots and stops; `when_needed` snapshots when the recorded offset is no longer available. |
| `heartbeat.interval.ms`       | `0` (off)                    | Periodic heartbeats keep `restart_lsn` advancing on idle databases. Recommended: `60000`.                                                                                                                                                                  |
| `max.batch.size`              | `2048`                       | Maximum number of change events processed per batch.                                                                                                                                                                                                       |
| `max.queue.size`              | `8192`                       | Maximum buffered change events before back-pressure kicks in.                                                                                                                                                                                              |
| `include.unknown.datatypes`   | `false`                      | When `true`, columns with unsupported types are emitted as binary instead of dropped.                                                                                                                                                                      |
| `offset.flush.interval.ms`    | `60000`                      | How often offsets are persisted to the Nilus REST offset store. Does not control how often LSN feedback is sent, that flows continuously while events are being processed.                                                                                 |

## Behavior and capabilities

* **Pipeline mode**: PostgreSQL CDC runs as a `cdc` source.
* Treat `topic.prefix` and `slot.name` as a stable identity. Once a pipeline is in production, changing them effectively starts a new pipeline (and orphans the old slot).
* Always set `publication.name` explicitly so it is obvious which publication belongs to which pipeline; rely on `publication.autocreate.mode = filtered` to keep it scoped to the include list.
* Monitor replication-slot lag (`pg_replication_slots.restart_lsn`, `pg_stat_replication`) and alert on growth. Inactive slots can fill `pg_wal` quickly.
* Use `heartbeat.interval.ms = 60000` for low-traffic databases so LSN feedback continues during idle periods.
* For schema changes, prefer additive migrations. Drops, type changes, and primary-key changes can require restarting the connector with a fresh snapshot.
* Capture only what you need, avoid `schema.include.list = public` on busy clusters when one or two tables would do.

## Troubleshooting

* **Symptom:** Pipeline starts but no change events arrive.
  * **Cause:** `wal_level` is not `logical`, the connection user lacks `REPLICATION`, or the publication is missing.
  * **Recovery:** Run `SHOW wal_level;`, `SELECT * FROM pg_publication WHERE pubname = '<publication.name>';`, and re-run the privilege grants above.
* **Symptom:** `pg_wal` directory grows quickly while the pipeline is paused or stopped.
  * **Cause:** The replication slot is inactive but still anchored to a stale `restart_lsn`.
  * **Recovery:** Restart Nilus so consumption resumes, or drop the slot if it is no longer needed: `SELECT pg_drop_replication_slot('<slot>');`.
* **Symptom:** Connector cannot resume; logs report a missing LSN.
  * **Cause:** WAL referenced by the slot was rotated out before consumption resumed.
  * **Recovery:** Recreate the slot (a fresh snapshot will run), then size WAL retention so the worst-case downtime fits inside it.
* **Symptom:** Updates / deletes show empty `before` blocks.
  * **Cause:** The captured table has no primary key and the default `REPLICA IDENTITY` is in effect.
  * **Recovery:** Add a primary key, or set `ALTER TABLE <table> REPLICA IDENTITY FULL;` for tables that genuinely cannot have one.
* **Symptom:** Connector fails to start with a publication conflict.
  * **Cause:** Two Nilus pipelines share the same publication name.
  * **Recovery:** Set unique `publication.name` values per pipeline and recreate the publication, or use `publication.autocreate.mode = filtered` so each pipeline owns its own.

## FAQs

<details>

<summary>What happens to the data in the replication slot after the Nilus pipeline is deleted?</summary>

Replication slots are persistent and survive Nilus deletion. They do *not* hold the change data themselves, they hold a `restart_lsn` reference that prevents PostgreSQL from recycling WAL the connector still needs.

When Nilus is not running:

* No LSN acknowledgements are sent, so `restart_lsn` does not advance.
* WAL files referenced by the slot accumulate on disk and `pg_wal` grows.

To stop accumulation, either restart Nilus (so consumption resumes) or drop the slot:

```sql
SELECT pg_drop_replication_slot('<slot>');
```

Useful queries:

```sql
SELECT * FROM pg_replication_slots;

SELECT slot_name,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS replication_lag
FROM pg_replication_slots;
```

</details>

<details>

<summary>How does Nilus flush LSN feedback, and at what frequency?</summary>

LSN feedback is continuous, driven by the streaming protocol, multiple times per second during active workloads. Heartbeat messages cover the idle case.

Separately, `offset.flush.interval.ms` controls how often the connector persists its processed offset to the Nilus REST offset store (default \~60s). That setting determines the small replay window after a restart, not the LSN feedback cadence to PostgreSQL.

</details>

<details>

<summary>Why does Nilus not delete the replication slot automatically?</summary>

Two reasons:

1. **Design choice.** Slots are preserved deliberately so a pipeline can resume cleanly without re-snapshotting. Auto-deleting would risk silent data loss.
2. **Permission model.** Slot deletion typically requires superuser privileges or slot ownership. The connector user is normally not a superuser, so even if Nilus tried, PostgreSQL would refuse.

To delete a slot manually:

```sql
SELECT pg_drop_replication_slot('<slot>');
```

</details>

<details>

<summary>Does <code>offset.flush.interval.ms</code> apply, given Nilus does not run on Kafka?</summary>

Yes. `offset.flush.interval.ms` is a CDC engine setting, not a Kafka-only one. In Nilus the offset store is the REST-backed store maintained by the platform, and the property still controls how frequently the engine flushes offsets to that store. It directly affects:

* How far back Nilus replays after a restart or crash.
* The small window of events that may be re-read if the process dies before the next flush.

</details>

<details>

<summary>Does <code>offset.flush.interval.ms</code> control how often PostgreSQL releases WAL?</summary>

No. WAL retention is governed by replication feedback (LSN acknowledgements) sent over the replication connection as the connector processes events. That feedback flows continuously and through heartbeats, independent of the offset flush interval. `offset.flush.interval.ms` is purely about the durability of the consumer's position, not the cadence of LSN advancement at PostgreSQL.

</details>

## Related docs

* [Understanding CDC Pipeline Config](/concepts/resources/nilus/cdc/service-config.md)
* [CDC Sample Configs](/concepts/resources/nilus/cdc/sample-configs.md)
* [Understanding Change Data Capture](/concepts/resources/nilus/cdc.md)
* [Working with PostgreSQL Partitioned Tables for CDC](/concepts/resources/nilus/troubleshooting/postgresql-cdc-partitioned-tables.md)
* **DataOS Lakehouse destinations**: see the [AWS-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/aws-backed.md), [Azure-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/azure-backed.md), or [GCP-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/gcp-backed.md) variants for writing CDC change events into a Lakehouse.
* [PostgreSQL (Batch)](/concepts/resources/nilus/batch/batch-sources/postgresql.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/cdc/cdc-sources/postgresql.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
