> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/cdc/cdc-sources/mysql.md).

# MySQL

MySQL is supported as a CDC source for capturing inserts, updates, and deletes at row granularity. Nilus's CDC engine reads the binary log (binlog) and translates each row event into a structured change event. Nilus persists the binlog position and GTID set in its REST-backed offset store, and it persists DDL through the REST schema-history store, so a restart resumes from exactly where the previous run left off without losing schema context.

Nilus connects to MySQL through a DataOS depot resolved with pipeline `type: cdc`. The depot keeps host, port, credentials, and TLS material outside the pipeline manifest.

## Connectivity

| Form         | Address example           | Notes                                                                             |
| ------------ | ------------------------- | --------------------------------------------------------------------------------- |
| DataOS depot | `dataos://my-mysql-depot` | Host, port, credentials, and TLS material are resolved from the depot definition. |

Defaults applied by Nilus:

* `topic.prefix = cdc` (override per pipeline; the prefix is also used as the sink table prefix)
* Each change event is emitted with the new row state and timestamp metadata; deletes preserve the original primary key.
* DDL is persisted through the REST schema-history store so restarts do not require a fresh snapshot.

## Requirements

> *These requirements must be satisfied at the database before Nilus can run a MySQL CDC pipeline.*

### Binary log enabled

The MySQL server must produce a row-format binlog with full row images. In `my.cnf` (or as Docker / cloud parameter group entries):

```ini
[mysqld]
server_id          = 184054        # any unique 32-bit integer per replica
log_bin            = mysql-bin
binlog_format      = ROW
binlog_row_image   = FULL
gtid_mode          = ON             # recommended; enables GTID-based recovery
enforce_gtid_consistency = ON
```

Restart MySQL for the changes to take effect. On AWS RDS / Aurora set the same parameters through the parameter group and reboot. On Azure Database for MySQL set them under server parameters.

### Connection user privileges

The Nilus connection user needs:

* `SELECT` on every captured database / table so the snapshot phase can read them.
* `RELOAD` (or `FLUSH_TABLES`) so the connector can take a consistent snapshot lock.
* `SHOW DATABASES`, `REPLICATION SLAVE`, and `REPLICATION CLIENT` so the connector can read the binlog stream.
* `LOCK TABLES` if you intend to use the locking snapshot mode.

```sql
CREATE USER 'nilus_cdc'@'%' IDENTIFIED BY '<strong-password>';

GRANT SELECT, RELOAD, SHOW DATABASES,
      REPLICATION SLAVE, REPLICATION CLIENT,
      LOCK TABLES
  ON *.* TO 'nilus_cdc'@'%';

GRANT ALL PRIVILEGES ON appdb.*  TO 'nilus_cdc'@'%'; -- or a narrower set per database

FLUSH PRIVILEGES;
```

### MySQL 8 authentication

MySQL 8 defaults to `caching_sha2_password`. Nilus's MySQL CDC connector ships with a JDBC driver that prefers `mysql_native_password`. Pick one of the two fixes:

```sql
-- Option A: switch the connector user to the native plugin (simplest)
ALTER USER 'nilus_cdc'@'%' IDENTIFIED WITH mysql_native_password BY '<strong-password>';
FLUSH PRIVILEGES;
```

```yaml
# Option B: keep caching_sha2_password and pass the JDBC flag
spec:
  source:
    cdc:
      database.allowPublicKeyRetrieval: "true"
```

If TLS is required, configure it through the depot or pass JDBC parameters via the URI query string.

### Captured-table expectations

* Captured tables must have a `PRIMARY KEY` (or a not-null `UNIQUE KEY`). Snapshots fail for tables without one.
* The fully qualified table name is `{database}.{table}`. `table.include.list` and `table.exclude.list` are case-sensitive on case-sensitive filesystems.
* DDL is captured automatically through the schema-history store; you do not need to restart the connector for additive `ALTER TABLE`s.

### Binlog retention

MySQL purges binlog files according to `binlog_expire_logs_seconds` (or the legacy `expire_logs_days`). The retention window must be long enough to cover any planned downtime, if the connector falls behind by more than the retained binlog horizon, a fresh snapshot is required.

## Core concepts

1. **Binlog primitive**
   * The MySQL binlog records every committed row change.
   * Nilus reads the binlog through its MySQL CDC connector and decodes it into change events.
   * With `binlog_row_image = FULL`, every event carries the full before and after row image.
2. **GTIDs and binlog position**
   * With GTIDs enabled, each transaction has a globally unique identifier. Nilus persists the GTID set so it can resume cleanly even if the upstream replica topology changes.
   * Without GTIDs, the connector falls back to file + position offsets. GTIDs are recommended for new deployments.
3. **Snapshot phase**
   * On first start, Nilus snapshots the captured tables consistently using a transactional read (locking strategy is configurable through `snapshot.locking.mode`).
   * The snapshot emits one event per row with `op = r` and establishes the baseline schema in the sink.
   * After the snapshot completes the connector switches to streaming mode.
4. **Streaming phase**
   * The connector reads new binlog events as they are written.
   * DDL events are routed to the REST schema-history store so restarts do not need to replay schema from scratch.
   * Heartbeat messages keep the persisted offset advancing on idle databases.
5. **Restart semantics**
   * On restart, Nilus loads the saved offset and schema history, validates that the binlog still contains the next event, and resumes streaming.
   * If the binlog has rolled past the saved position, the connector aborts so it does not silently lose events. Recovery requires increasing binlog retention and re-running with a fresh snapshot.

### Change event shape

```json
{
  "before": {"id": 1, "name": "old"},
  "after":  {"id": 1, "name": "new"},
  "source": {
    "version":   "3.4.2.Final",
    "connector": "mysql",
    "name":      "cdc",
    "ts_ms":     1700000000123,
    "snapshot":  "false",
    "db":        "appdb",
    "table":     "customers",
    "server_id": 184054,
    "gtid":      "3E11FA47-71CA-11E1-9E33-C80AA9429562:23",
    "file":      "mysql-bin.000003",
    "pos":       4
  },
  "op":    "u",
  "ts_ms": 1700000000234
}
```

`op` values: `r` (read during snapshot), `c` (insert), `u` (update), `d` (delete). With `transforms.unwrap` the envelope is flattened to the row body and `_op`, `_table`, `_db`, and `_source.ts_ms` are appended as columns.

## Sample Nilus config

The following example uses a DataOS depot for connectivity, captures two tables, and writes change events to a Lakehouse sink in `flatten` mode.

<details>

<summary>Nilus YAML, MySQL CDC → Lakehouse</summary>

```yaml
name: ncdc-mysql-to-lakehouse
version: v1alpha
type: nilus
tags:
  - nilus-cdc
description: Nilus CDC pipeline for MySQL → DataOS Lakehouse
spec:
  type: cdc
  compute: query-default
  logLevel: INFO
  source:
    address: dataos://prod-mysql-depot
    options:
      strategy: flatten
    cdc:
      database.server.id: "184054"
      database.include.list: "appdb"
      table.include.list: "appdb.customers,appdb.orders"
      topic.prefix: "appdb_cdc"
      snapshot.mode: initial
      heartbeat.interval.ms: 60000
      max.batch.size: 2048
      max.queue.size: 8192
  sink:
    address: dataos://analytics-lakehouse
    options:
      dest_table: appdb_cdc_customers
      incremental_strategy: append
```

</details>

The depot referenced as `dataos://my-mysql-depot` resolves connection details (host, port, credentials, TLS) from its DataOS definition; no inline secret projection is needed in the manifest above.

## CDC options

These are the most useful CDC connector properties for MySQL. They go under `source.cdc` in the Nilus manifest.

| Option                             | Default                 | Description                                                                                                                                                                                                                                                            |
| ---------------------------------- | ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `database.server.id`               | *No default*            | Unique 32-bit integer that identifies this connector to MySQL. Must be unique among every replica and CDC connector reading from this server.                                                                                                                          |
| `database.include.list`            | *No default*            | Comma-separated regex / literals matching databases to capture. Mutually exclusive with `database.exclude.list`.                                                                                                                                                       |
| `database.exclude.list`            | *No default*            | Inverse of `database.include.list`.                                                                                                                                                                                                                                    |
| `table.include.list`               | *No default*            | Comma-separated `database.table` patterns to include. Mutually exclusive with `table.exclude.list`. Wildcards are not supported, use full names.                                                                                                                       |
| `table.exclude.list`               | *No default*            | Inverse of `table.include.list`.                                                                                                                                                                                                                                       |
| `column.include.list`              | *No default*            | `database.table.column` patterns to include in change events.                                                                                                                                                                                                          |
| `column.exclude.list`              | *No default*            | Inverse of `column.include.list`.                                                                                                                                                                                                                                      |
| `topic.prefix`                     | `cdc` *(Nilus default)* | Logical namespace for this pipeline. Must be unique across CDC pipelines. Nilus also uses it as the sink table prefix.                                                                                                                                                 |
| `snapshot.mode`                    | `initial`               | Snapshot behavior on first start. `initial` snapshots when no offset is recorded; `always` re-snapshots every start; `schema_only` captures only schema (skip data); `never` skips snapshots; `when_needed` snapshots when the recorded offset is no longer available. |
| `snapshot.locking.mode`            | `minimal`               | Snapshot locking strategy. `minimal` holds a global read lock briefly; `extended` locks longer; `none` skips locking (requires `binlog_row_image = FULL` and risks inconsistency).                                                                                     |
| `database.allowPublicKeyRetrieval` | `false`                 | When `true`, the JDBC driver is allowed to retrieve the server's public key over a non-TLS connection. Set to `true` for MySQL 8 with `caching_sha2_password` if you do not switch the user to `mysql_native_password`.                                                |
| `heartbeat.interval.ms`            | `0` (off)               | Periodic heartbeats keep the persisted binlog position advancing on idle databases. Recommended: `60000`.                                                                                                                                                              |
| `max.batch.size`                   | `2048`                  | Maximum number of change events processed per batch.                                                                                                                                                                                                                   |
| `max.queue.size`                   | `8192`                  | Maximum buffered change events before back-pressure kicks in.                                                                                                                                                                                                          |
| `gtid.source.includes`             | *No default*            | Optional GTID source filter; useful when reading from a replica that is downstream of another GTID-emitting source.                                                                                                                                                    |

## Behavior and capabilities

* **Pipeline mode**: MySQL CDC runs as a `cdc` source.
* Treat `database.server.id` as a stable identity. Reusing the same server ID across pipelines pointed at the same MySQL server causes the binlog client to fight for the same slot.
* Treat `topic.prefix` as a stable identity for the pipeline; changing it after first deployment effectively starts a new pipeline.
* Prefer GTID-enabled MySQL deployments. They make recovery and topology changes far simpler.
* Increase `binlog_expire_logs_seconds` (or `expire_logs_days`) so worst-case downtime fits inside binlog retention. Falling off the binlog horizon forces a fresh snapshot.
* Use narrow `database.include.list` / `table.include.list` patterns. The connector evaluates filters early, anything not on the include list is skipped during both snapshot and streaming.
* For schema changes, prefer additive migrations. Renames and column drops can require a fresh snapshot or a schema-history rebuild.
* When changing the include list, restart the connector, added tables are not picked up incrementally.

## Troubleshooting

* **Symptom:** `Public Key Retrieval is not allowed` at startup.
  * **Cause:** MySQL 8 with `caching_sha2_password` and a non-TLS connection without `database.allowPublicKeyRetrieval`.
  * **Recovery:** Either set `database.allowPublicKeyRetrieval: "true"` in `source.cdc`, or change the connector user to `mysql_native_password`.
* **Symptom:** `Access denied; you need (at least one of) the RELOAD or FLUSH_TABLES privilege(s)`.
  * **Cause:** The connector user is missing `RELOAD` (or `LOCK TABLES`) needed for the snapshot lock.
  * **Recovery:** `GRANT RELOAD, LOCK TABLES ON *.* TO '<user>'@'%';` and re-run.
* **Symptom:** `After applying the include/exclude list filters, no changes will be captured.`
  * **Cause:** Patterns in `database.include.list` / `table.include.list` do not match any object, typically a typo or case mismatch.
  * **Recovery:** Validate names with `SHOW DATABASES;` and `SHOW TABLES IN <db>;`. Patterns are case-sensitive on Linux.
* **Symptom:** Connector aborts on restart citing a missing binlog file.
  * **Cause:** The persisted binlog position has rolled out of retention.
  * **Recovery:** Increase `binlog_expire_logs_seconds` and recreate the connector with `snapshot.mode: when_needed` so it can re-snapshot.
* **Symptom:** Snapshot fails for one table with a primary-key error.
  * **Cause:** The table has no `PRIMARY KEY` and no usable `UNIQUE KEY`.
  * **Recovery:** Add a primary key, exclude the table from capture, or accept that updates / deletes for that table cannot be reliably streamed.
* **Symptom:** New tables added to `table.include.list` never appear downstream.
  * **Cause:** Filters are evaluated only at startup; the running connector ignores list edits.
  * **Recovery:** Restart the connector and confirm that the new tables snapshot before streaming resumes.

## Related docs

* [Understanding CDC Pipeline Config](/concepts/resources/nilus/cdc/service-config.md)
* [CDC Sample Configs](/concepts/resources/nilus/cdc/sample-configs.md)
* [Understanding Change Data Capture](/concepts/resources/nilus/cdc.md)
* **DataOS Lakehouse destinations**: see the [AWS-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/aws-backed.md), [Azure-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/azure-backed.md), or [GCP-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/gcp-backed.md) variants for writing CDC change events into a Lakehouse.
* [MySQL (Batch)](/concepts/resources/nilus/batch/batch-sources/mysql.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/cdc/cdc-sources/mysql.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
