> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/batch/batch-sources/mysql.md).

# MySQL

[MySQL](https://dev.mysql.com/doc/) is the most widely deployed open-source relational database. Nilus reads from MySQL as a **batch source** through the standard MySQL SQL path, with optional incremental ingestion driven by a monotonic key column. Internally Nilus rewrites `mysql://` to `mysql+pymysql://` before connecting.

For row-level change capture from MySQL, use the [MySQL (CDC)](/concepts/resources/nilus/cdc/cdc-sources/mysql.md) instead.

## Requirements

Connectivity and credentials must both be in place before the pipeline can run.

### Connectivity

* The Nilus runtime must reach the MySQL host on the configured port (default `3306`).
* The configured user must be able to connect from the runtime's network and have `SELECT` on the target tables.
* For managed MySQL (RDS, Cloud SQL, Aurora, Cloud SQL for MySQL), allowlist the runtime egress IP on the database side.

### Required parameters

| Parameter  | Required | Default | Description                                                                                                                                                               |
| ---------- | -------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `host`     | Yes      | -       | MySQL server hostname or IP address.                                                                                                                                      |
| `port`     | No       | `3306`  | MySQL port.                                                                                                                                                               |
| `database` | Yes      | -       | Source database name (the URI path segment). For Depot-backed configs, prefer leaving the default empty and using fully qualified `<database>.<table>` in `source_table`. |
| `username` | Yes      | -       | MySQL user with `SELECT` access on the target tables.                                                                                                                     |
| `password` | Yes      | -       | Password for the MySQL user.                                                                                                                                              |

### Database-side permissions

```sql
GRANT SELECT ON `<database_name>`.* TO '<username>'@'<host>';
FLUSH PRIVILEGES;
```

For very fine-grained access, scope the grant to specific tables:

```sql
GRANT SELECT ON `<database_name>`.`<table_name>` TO '<username>'@'<host>';
```

### URI format

```
mysql://<username>:<password>@<host>:<port>/<database>
```

Depot-backed equivalent (recommended for production):

```
dataos://<mysql-depot>
```

## Source options

| Option                 | Required | Description                                                                                                                                                                    |
| ---------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `source_table`         | Yes      | Fully qualified table name in `<database>.<table>` form (e.g. `sales.orders`).                                                                                                 |
| `incremental_key`      | No       | Timestamp or numeric column used to identify newly visible rows for each run. Use a monotonic column (`updated_at`, sequence-backed `id`).                                     |
| `interval_start`       | No       | Optional ISO-8601 lower bound for the extraction window.                                                                                                                       |
| `interval_end`         | No       | Optional ISO-8601 upper bound for the extraction window.                                                                                                                       |
| `page_size`            | No       | Rows per extraction batch (default `50000`). Lower it for wide tables that strain runtime memory.                                                                              |
| `sql_reflection_level` | No       | `full` (default) or reduced, controls how thoroughly Nilus reflects the source schema before extraction.                                                                       |
| `sql_limit`            | No       | Caps total rows extracted per run. Use for sampling and validation.                                                                                                            |
| `sql_exclude_columns`  | No       | Comma-separated column names to skip during extraction.                                                                                                                        |
| `type_hints`           | No       | Object map of `column_name: <type>` to override inferred types. Supported types: `text`, `bigint`, `bool`, `timestamp`, `date`, `decimal`, `double`, `binary`, `json`, `time`. |
| `max_table_nesting`    | No       | String. `"0"` for fully flattened analytics-friendly output (default for SQL sources).                                                                                         |

## Sample Nilus configs

Each example below is self-contained and uses the current Nilus pipeline shape.

### Batch, incremental ingestion to Lakehouse

```yaml
name: nilus-mysql-batch
version: v1alpha
type: nilus
description: MySQL → DataOS Lakehouse incremental snapshot
spec:
  type: batch
  compute: runnable-default
  source:
    address: dataos://mysql-depot
    options:
      source_table: sales.orders
      incremental_key: updated_at
  sink:
    address: dataos://analytics-lakehouse
    options:
      dest_table: sales.orders_raw
      incremental_strategy: merge
      aws_region: us-west-2
```

### Batch, direct URI with bounded backfill

```yaml
spec:
  type: batch
  source:
    address: mysql://{MYSQL_USERNAME}:{MYSQL_PASSWORD}@mysql.example.com:3306/sales
    options:
      source_table: sales.orders
      incremental_key: updated_at
      interval_start: "2024-01-01T00:00:00Z"
      interval_end: "2024-04-01T00:00:00Z"
  sink:
    address: dataos://analytics-lakehouse
    options:
      dest_table: sales.orders_q1_2024
      incremental_strategy: replace
```

## Behavior and capabilities

* **Compute model**: Nilus connects through SQLAlchemy + the `pymysql` driver and submits parameterized SQL reads.
* **Object model**: standard MySQL tables and views addressed as `<database>.<table>`.
* **Pipeline mode**: `batch` only on this page; for row-level change capture see the [MySQL (CDC)](/concepts/resources/nilus/cdc/cdc-sources/mysql.md).
* **URI scheme rewrite**: `mysql://` URIs are rewritten internally to `mysql+pymysql://` to bind to the right SQLAlchemy dialect. Authoring `mysql+pymysql://` directly in the manifest also works.
* **Identifier case**: MySQL identifier case sensitivity depends on the server's `lower_case_table_names` setting. Match the server convention exactly in `source_table`.
* **Custom queries**: supply `source_table: "query:SELECT ... FROM ..."` to extract from a hand-authored SQL query instead of a single table.

## Troubleshooting

| Symptom                                                                 | Likely cause                                                                                                       | Resolution                                                                                                                                        |
| ----------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Access denied for user '<user>'@'<host>'`                              | Wrong credentials, or the user is provisioned for a different host pattern (e.g. `'%'` vs the runtime IP).         | Re-check credentials and confirm the user is granted from `'%'` or the runtime egress range.                                                      |
| `Can't connect to MySQL server on '<host>'`                             | Network path blocked, or the server is bound to `127.0.0.1`.                                                       | Verify the runtime egress IP is allowlisted on the database side and that the server is listening on a routable interface.                        |
| `Table '<db>.<tbl>' doesn't exist` despite the table existing in the UI | Server has `lower_case_table_names=0` (case-sensitive); the manifest used a different case than the on-disk table. | Match the on-disk identifier case exactly in `source_table`.                                                                                      |
| `Lost connection to MySQL server during query`                          | Long-running extraction exceeded `wait_timeout`/`net_read_timeout`.                                                | Lower `page_size` so each batch is shorter, or have your DBA raise the relevant timeouts on the source.                                           |
| Run extracts the full table every time despite `incremental_key`        | The column is not monotonically increasing in the source, or the pipeline state was reset between runs.            | Pick a column that is genuinely monotonic; consider `updated_at` set by an `ON UPDATE CURRENT_TIMESTAMP` clause.                                  |
| Wide tables OOMKill the runtime                                         | `page_size` (default `50000`) holds too much data in memory for very wide rows.                                    | Lower `page_size` to `25000` or `10000`. See [Optimize Sink Datasets](/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets.md). |

## Related docs

* [MySQL (CDC)](/concepts/resources/nilus/cdc/cdc-sources/mysql.md): companion CDC connector for row-level change capture.
* [Optimize Sink Datasets](/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets.md): guidance on `incremental_strategy`, `partition_by`, and other shape settings.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/batch/batch-sources/mysql.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
