> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/cdc/cdc-sources/ms-sql-server.md).

# MS SQL Server

MS SQL Server is supported as a CDC source for capturing inserts, updates, and deletes at row granularity. Nilus's CDC engine reads the CDC change tables that SQL Server maintains for each enabled source table. Nilus persists offsets through its REST-backed offset store and persists DDL through the REST schema-history store, so a restart resumes from the last acknowledged position with the correct schema context.

Nilus connects to MS SQL Server through a DataOS depot resolved with pipeline `type: cdc`. The depot keeps host, port, credentials, and driver settings outside the pipeline manifest.

## Connectivity

| Form         | Address example           | Notes                                                                             |
| ------------ | ------------------------- | --------------------------------------------------------------------------------- |
| DataOS depot | `dataos://my-mssql-depot` | Host, port, credentials, and TLS material are resolved from the depot definition. |

Defaults applied by Nilus:

* `topic.prefix = cdc` (override per pipeline; the prefix is also used as the sink table prefix)
* `database.encrypt = false` by default; pass additional `database.*` properties through `source.cdc` if your environment requires TLS or instance-specific settings.
* Each change event is emitted with the new row state and timestamp metadata; deletes preserve the original primary key.
* DDL is persisted through the REST schema-history store.

## Requirements

> *These requirements must be satisfied at the database before Nilus can run a SQL Server CDC pipeline.*

### Enable CDC at the database and table level

```sql
USE MyDB;
GO
EXEC sys.sp_cdc_enable_db;
GO

EXEC sys.sp_cdc_enable_table
  @source_schema       = N'dbo',
  @source_name         = N'customers',
  @role_name           = NULL,    -- optional gating role
  @supports_net_changes = 0;
GO
```

Repeat `sp_cdc_enable_table` for every table Nilus is expected to capture. The role passed via `@role_name` (or `NULL` for none) controls which database principals can read the underlying CDC change tables; if you set a role, the Nilus runtime user must be a member.

### SQL Server Agent must be running

SQL Server Agent populates and trims the CDC change tables. If the Agent stops, no new change rows are produced and the pipeline silently goes idle even though source writes continue.

### Privileges

* The operator who enables CDC needs elevated privileges such as `sysadmin` and `db_owner`.
* The Nilus runtime user needs:
  * `db_datareader` (or column-scoped `SELECT`) on the captured columns.
  * Membership in the gating role configured by `@role_name`, if any.
  * `VIEW SERVER STATE` if you rely on the engine's default Agent status check; otherwise supply a custom status query through `source.cdc`.

### CDC retention

CDC change tables are pruned by the Agent's clean-up job. Retention must be long enough to cover any planned downtime, if cleanup removes change rows that Nilus still needs, a fresh snapshot is required. Default retention is three days; for production CDC pipelines, raise it explicitly.

### Schema and feature constraints

* Indexed views are not supported.
* Schema changes on captured tables require creating a new CDC capture instance; the connector transitions to it once both instances exist briefly in parallel.
* Tables without a primary key cannot stream updates and deletes reliably.

## Core concepts

1. **Change tables, not log mining**
   * SQL Server materializes committed row changes into CDC change tables under the `cdc` schema.
   * Nilus reads from those change tables rather than tailing the transaction log directly.
   * The Agent process is responsible for keeping change tables current, so its health is part of the CDC contract.
2. **Snapshot + stream flow**
   * Nilus begins with a consistent snapshot of the captured tables and then continues from CDC change rows as new transactions commit.
   * Snapshot mode is configurable through `snapshot.mode`.
   * On restart, the persisted offset is reused so the snapshot phase is skipped.
3. **Offset and schema-history persistence**
   * Offsets are persisted to the Nilus REST offset store; restart points to the last acknowledged change-table position.
   * DDL is persisted to the REST schema-history store, so the connector can rebuild table schemas without reading from the source.
4. **Capture instances and schema evolution**
   * Each captured source table maps to a CDC capture instance. Compatible schema changes (additive columns) flow through naturally.
   * Incompatible changes, column renames, type changes, primary-key changes, require a new capture instance and a deliberate cut-over.
5. **Operational risk areas**
   * Agent health, CDC retention, and capture-instance management dominate operating effort.
   * Network partitions cause the connector to retry; long partitions can push the persisted offset behind retention and force a fresh snapshot.

## Sample Nilus config

<details>

<summary>Nilus YAML, SQL Server CDC → Lakehouse</summary>

```yaml
name: ncdc-mssql-to-lakehouse
version: v1alpha
type: nilus
tags:
  - nilus-cdc
description: Nilus CDC pipeline for MS SQL Server → DataOS Lakehouse
spec:
  type: cdc
  compute: query-default
  source:
    address: dataos://prod-mssql-depot
    options:
      strategy: flatten
    cdc:
      database.names: "MyDB"
      table.include.list: "dbo.customers,dbo.orders"
      topic.prefix: "erp_cdc"
      snapshot.mode: initial
      heartbeat.interval.ms: 60000
      max.batch.size: 2048
      max.queue.size: 8192
  sink:
    address: dataos://analytics-lakehouse
    options:
      dest_table: erp_cdc_customers
      incremental_strategy: append
```

</details>

The depot referenced as `dataos://my-mssql-depot` resolves connection details (host, port, credentials, TLS) from its DataOS definition; no inline secret projection is needed in the manifest above.

## CDC options

These are the most useful CDC connector properties for SQL Server. They go under `source.cdc` in the Nilus manifest.

| Option                            | Default                   | Description                                                                                                                                                                                                                                   |
| --------------------------------- | ------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `database.names`                  | *No default*              | Comma-separated list of databases to capture. SQL Server CDC connector reads from all listed databases on the same server.                                                                                                                    |
| `table.include.list`              | *No default*              | Comma-separated list of fully qualified `schema.table` identifiers to capture.                                                                                                                                                                |
| `table.exclude.list`              | *No default*              | Inverse of `table.include.list`.                                                                                                                                                                                                              |
| `column.include.list`             | *No default*              | `schema.table.column` patterns to include in change events.                                                                                                                                                                                   |
| `column.exclude.list`             | *No default*              | Inverse of `column.include.list`.                                                                                                                                                                                                             |
| `topic.prefix`                    | `cdc` *(Nilus default)*   | Logical namespace for this pipeline. Must be unique across CDC pipelines. Nilus also uses it as the sink table prefix.                                                                                                                        |
| `snapshot.mode`                   | `initial`                 | Snapshot behavior on first start. `initial` snapshots when no offset is recorded; `always` re-snapshots every start; `schema_only` captures only schema (skip data); `when_needed` snapshots when the recorded offset is no longer available. |
| `database.encrypt`                | `false` *(Nilus default)* | Set to `true` and provide trust settings if the SQL Server instance enforces TLS.                                                                                                                                                             |
| `database.trustServerCertificate` | *Driver default*          | Useful in dev environments where the server certificate is self-signed.                                                                                                                                                                       |
| `heartbeat.interval.ms`           | `0` (off)                 | Periodic heartbeats keep the persisted offset advancing on idle databases. Recommended: `60000`.                                                                                                                                              |
| `max.batch.size`                  | `2048`                    | Maximum number of change events processed per batch.                                                                                                                                                                                          |
| `max.queue.size`                  | `8192`                    | Maximum buffered change events before back-pressure kicks in.                                                                                                                                                                                 |
| `offset.flush.interval.ms`        | `60000`                   | How often offsets are persisted to the Nilus REST offset store.                                                                                                                                                                               |
| `strategy`                        | `flatten`                 | `flatten` emits row-shaped records (one row per change). `changelog` preserves the full CDC envelope. Set under `source.options`, not `source.cdc`.                                                                                           |

## Behavior and capabilities

* **Pipeline mode**: MS SQL Server CDC runs as a `cdc` source.
* Keep SQL Server Agent healthy and monitored. If it stops, Nilus stops receiving new CDC rows even though source writes continue.
* Use a narrow `table.include.list` instead of capturing every CDC-enabled table by default.
* Keep `topic.prefix` stable after the first production deployment because downstream naming depends on it.
* For incompatible schema changes, create a new capture instance and cut the pipeline over deliberately rather than expecting the existing one to adapt automatically.
* Raise CDC retention before turning on the pipeline. Three days (the default) is rarely enough for production.

## Troubleshooting

* **Symptom:** No new events arrive even though source tables are changing.
  * **Cause:** SQL Server Agent is stopped, CDC is not enabled on the table, or the table is missing from `table.include.list`.
  * **Recovery:** Verify Agent status, confirm `sys.sp_cdc_help_change_data_capture` returns the table, and redeploy with the correct include list.
* **Symptom:** Pipeline fails after a long pause.
  * **Cause:** CDC cleanup removed change rows that Nilus still needed.
  * **Recovery:** Increase CDC retention, validate the cleanup job settings, and restart the pipeline with a fresh snapshot.
* **Symptom:** Downstream columns stop matching the source after a schema change.
  * **Cause:** The source table moved to a new capture instance, or the old capture instance no longer reflects the source schema.
  * **Recovery:** Create and validate the new capture instance, then redeploy the Nilus pipeline so it transitions cleanly.
* **Symptom:** Connector errors out reading Agent status.
  * **Cause:** The runtime user lacks `VIEW SERVER STATE` and the engine's default status query is blocked.
  * **Recovery:** Grant `VIEW SERVER STATE`, or supply a custom status query through `source.cdc.snapshot.select.statement.overrides`.

## Related docs

* [Understanding CDC Pipeline Config](/concepts/resources/nilus/cdc/service-config.md)
* [CDC Sample Configs](/concepts/resources/nilus/cdc/sample-configs.md)
* [Understanding Change Data Capture](/concepts/resources/nilus/cdc.md)
* [MS SQL Server (Batch)](/concepts/resources/nilus/batch/batch-sources/ms-sql-server.md)
* [MS SQL Server](/concepts/resources/nilus/destinations/databases/ms-sql-server.md)
* **DataOS Lakehouse destinations**: see the [AWS-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/aws-backed.md), [Azure-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/azure-backed.md), or [GCP-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/gcp-backed.md) variants for writing CDC change events into a Lakehouse.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/cdc/cdc-sources/ms-sql-server.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
