> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/destinations/databases/mongodb.md).

# MongoDB

[MongoDB](https://www.mongodb.com/docs/) is a distributed NoSQL document database. Nilus writes into MongoDB collections for both batch and CDC pipelines, supporting `replace` (drop + reinsert) and `merge` (`ReplaceOne` upsert on declared primary keys) incremental strategies.

## Requirements

Connectivity and credentials must both be in place before the pipeline can run.

### Connectivity

* The Nilus runtime must reach the MongoDB endpoint. For self-hosted clusters that is the configured replica-set host on TCP `27017`; for Atlas / managed MongoDB it is the connection string from the cluster console.
* For Atlas, allowlist the runtime egress IP range under **Network Access**, otherwise authentication will succeed but every operation will time out.
* The destination accepts both `mongodb://` (host list) and `mongodb+srv://` (DNS SRV record) URI schemes.

### Required parameters

| Parameter        | Required    | Default | Description                                                                                                                                         |
| ---------------- | ----------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `host`           | Yes         | -       | MongoDB server hostname (or comma-separated replica-set member list).                                                                               |
| `port`           | No          | `27017` | MongoDB port. Omit when using `mongodb+srv://`.                                                                                                     |
| `username`       | Conditional | -       | Required when the cluster enforces authentication.                                                                                                  |
| `password`       | Conditional | -       | Required when the cluster enforces authentication.                                                                                                  |
| Query parameters | No          | -       | Any standard connection-string option, e.g. `tls=true`, `retryWrites=true`, `authSource=admin`, `replicaSet=rs0`. Forwarded verbatim to the driver. |

> **Important** Do **not** put the database name in the URI path. The MongoDB destination derives the database from the **first segment** of `dest_table` (`<database>.<collection>`); the URI's database segment, if present, is overwritten with that.

### Permissions

The MongoDB user needs:

* Permission to authenticate against the cluster.
* Permission to read, insert, update, and delete on the target collection.
* For `replace`, permission to issue `delete_many({})` on the target collection (which is granted by the standard `readWrite` role).
* For `merge`, the same `readWrite` role is enough; `ReplaceOne(upsert=True)` covers both update and insert paths.

### URI format

```
mongodb://<username>:<password>@<host>:<port>?<optional-query-params>
```

SRV-discovery variant (Atlas-style):

```
mongodb+srv://<username>:<password>@<cluster>.mongodb.net?retryWrites=true&w=majority
```

## Sink options

| Option                 | Required | Description                                                                                                                                             |
| ---------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `dest_table`           | Yes      | Target collection in `<database>.<collection>` form. The destination splits on the **first** `.` only, so collection names may contain additional dots. |
| `incremental_strategy` | Yes      | One of `replace` or `merge`. `append` is **not** supported and raises `Unsupported write disposition 'append' for MongoDB destination.`                 |

### Incremental strategy semantics

| Strategy  | Behavior                                                                                                                                                                                                                                       |
| --------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `replace` | On the first batch of each run, the destination issues `delete_many({})` against the target collection and then inserts documents in batches of 10,000.                                                                                        |
| `merge`   | Uses `bulk_write` with `ReplaceOne({primary_keys}, doc, upsert=True)`. **Requires** the manifest to declare a `primary_key` on the dataset; otherwise the destination raises `Merge operation requires primary keys for table '<collection>'.` |
| `append`  | **Not supported.**                                                                                                                                                                                                                             |

## Sample Nilus configs

Each example below is self-contained and uses the current Nilus pipeline shape.

### Batch ingestion with replace

```yaml
name: postgres-to-mongodb-replace
version: v1alpha
type: nilus
spec:
  type: batch
  compute: universe-compute
  source:
    address: postgresql://{PG_USERNAME}:{PG_PASSWORD}@postgres.example.com:5432/ecommerce
    options:
      source_table: public.users
  sink:
    address: mongodb://{MONGO_USERNAME}:{MONGO_PASSWORD}@mongo.example.com:27017
    options:
      dest_table: analytics.users
      incremental_strategy: replace
```

### CDC ingestion with merge (current-state mirror)

```yaml
spec:
  type: cdc
  compute: universe-compute
  source:
    address: dataos://postgres-cdc
    cdc:
      table.include.list: "public.orders"
      topic.prefix: "orders_cdc"
  sink:
    address: mongodb+srv://{MONGO_USERNAME}:{MONGO_PASSWORD}@analytics.example.mongodb.net?retryWrites=true&w=majority
    options:
      dest_table: analytics.orders
      incremental_strategy: merge
      primary_key: id
```

## Behavior and capabilities

* **Compute model**: the Nilus runtime drives the cluster through the official `pymongo` driver; the cluster does the inserts / upserts.
* **Object model**: MongoDB databases and collections; one Nilus pipeline writes into exactly one collection.
* **Supported pipeline modes**: `batch` and `cdc`.
* **Bulk shape**: internal batch size is fixed at `10,000` documents per server round-trip; the loader batch size feeding the destination is `1,000`. Neither is user-configurable today. `merge` runs use `bulk_write(..., ordered=False)` so individual document failures do not abort the whole batch.
* **Document IDs**: incoming documents may carry an `_id` field; if absent, MongoDB auto-generates one. For `merge`, the primary keys you declare in the manifest are the upsert filter, not `_id` directly.
* **Type conversions**: numbers map to MongoDB int / long / double; ISO timestamps to `ISODate`; booleans, strings, lists, and nested objects map naturally; complex unsupported types are stringified.

## Performance considerations

* `replace` requires a full collection drop on every run; this is fine for small / medium reference data but expensive for collections with millions of documents and many indexes (each rebuild rehydrates all indexes).
* For large CDC mirrors, prefer `merge` so MongoDB only updates changed documents; build a unique index on the primary-key fields ahead of time so the upsert lookup is bounded.
* For high-volume append-style event streams, sink to a Lakehouse / warehouse first, then sync the materialized current-state view back to MongoDB.

## Troubleshooting

| Symptom                                                           | Likely cause                                                                                            | Resolution                                                                                                                                         |
| ----------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Database name is required to connect to the MongoDB database.`   | `dest_table` was supplied as a bare collection name.                                                    | Use `<database>.<collection>` form.                                                                                                                |
| `Unsupported write disposition 'append' for MongoDB destination.` | `incremental_strategy: append` was set.                                                                 | Use `replace` or `merge`.                                                                                                                          |
| `Merge operation requires primary keys for table '<collection>'.` | `incremental_strategy: merge` is set but the dataset spec has no `primary_key`.                         | Declare a `primary_key` on the dataset in the manifest.                                                                                            |
| `ServerSelectionTimeoutError`                                     | Atlas IP allow-list rejects the runtime egress IP, or the SRV record could not be resolved.             | Add the runtime egress range to the Atlas Network Access list and confirm DNS resolution from inside the runtime.                                  |
| `OperationFailure: Authentication failed`                         | Wrong username/password, wrong `authSource`, or the user is not provisioned on the database/collection. | Verify credentials, add `authSource=admin` (or the correct auth DB) to the URI query, and confirm the user has `readWrite` on the target database. |
| Duplicate-key error during `merge`                                | Documents have ambiguous primary-key values, or the primary key is not actually unique upstream.        | Either deduplicate upstream or pick a primary key that is genuinely unique.                                                                        |
| TLS handshake failures                                            | Cluster requires TLS; URI did not opt in.                                                               | Append `?tls=true` (and, for self-signed dev clusters, `tlsAllowInvalidCertificates=true`).                                                        |

## Related docs

* [MongoDB (Batch)](/concepts/resources/nilus/batch/batch-sources/mongodb.md): companion batch source.
* [MongoDB (CDC)](/concepts/resources/nilus/cdc/cdc-sources/mongodb.md): companion CDC source.
* [Optimize Sink Datasets](/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets.md): guidance on `incremental_strategy` and dataset-shape tuning.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/destinations/databases/mongodb.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
