> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/troubleshooting/mongodb-cdc-error-286.md).

# MongoDB: Error 286 in CDC Pipelines

This article explains MongoDB CDC error 286, its root cause (oplog history loss), and how to prevent and recover from it.

## TL;DR

* Error 286 is a symptom of history loss: the resume token is gone.
* The root cause is an undersized or force-truncated oplog relative to the maximum connector lag.
* Prevention: correct oplog sizing, sufficient connector throughput, and monitoring.
* Recovery is straightforward (new snapshot or larger oplog) but can be time-consuming. Plan ahead.

## Overview

Error 286 indicates that a MongoDB change stream (which Nilus relies on for change data capture) attempted to resume from a point that is no longer present in the replica-set oplog (`local.oplog.rs`). When this happens, Nilus logs the following exception:

```
Command failed with error 286 (ChangeStreamHistoryLost):
Resume of change stream was not possible, as the resume point may no longer be in the oplog
```

Understanding why the oplog entry disappeared and how to size and monitor the oplog is critical for reliable CDC.

## The MongoDB oplog in a nutshell

* A capped collection (`local.oplog.rs`) that stores every write against the primary so secondaries (and tools such as Nilus) can replicate those changes.
* The oplog is truncated on a first-in-first-out basis once it reaches its allocated size.
* Default size: when you initiate a replica set, MongoDB chooses the oplog size automatically: 5% of free disk space (minimum about 990 MB).

> Because the collection is capped, what matters is the time window: how many hours of history does this size translate to under your peak write load?

## Why error 286 happens

1. The connector is paused or slowed down; older entries roll off before it resumes.
2. Manually shrinking the oplog or re-initializing the replica set discards old tokens.
3. MongoDB may truncate more aggressively if the filesystem is full.
4. Very high write bursts, such as a bulk load, shrink the effective time window.

> When the connector restarts it looks up its resume token in the oplog. If that token has vanished, MongoDB throws error 286 and Nilus refuses to start.

## Recovery options

* Delete and reapply the failing service with a new PV directory or PVC. Alternatively, keep the connector running and delete the offset (from the PVC directory) so Nilus treats it as new and snapshots again.

OR

* Change the name of the connector service (from `nilus:0.0.13` onwards) using the same config. Nilus takes a snapshot (as specified by `snapshot.mode`) and then continues to stream changes.

> **Note:** Error 286 cannot be resolved by simply restarting the connector. Manual intervention is required to restore the service once this error occurs.

## Preventing error 286

1. Size the oplog for worst-case lag

> MongoDB 4.4+ also supports `replSetResizeOplog`, and 6.0 adds `minRetentionHours` for time-based guarantees.

The objective is to make sure the oplog retains at least as much history as the connector could ever fall behind, with headroom for bursts. Formula: `RequiredSize(MB) ≈ PeakWrites/sec × MaxLag(sec) × avgEntrySize × safetyFactor`

* **Peak writes/sec**: Insert + Update + Delete ops during the busiest interval (consult `serverStatus().opcounters` or monitoring).
* **Max lag**: Longest plausible outage or back-pressure window (connector maintenance + downstream outage + buffer).
* **avgEntrySize**: In bytes; rule of thumb about 1 kB if most documents are small.
* **safetyFactor**: 1.3 to 2.0 depending on risk appetite.

<details>

<summary>Example</summary>

| Parameter       | Value                | Notes                                     |
| --------------- | -------------------- | ----------------------------------------- |
| Peak writes/sec | **15 000** ops       | Observed from Grafana at 95-th percentile |
| Max lag         | **30 min** = 1 800 s | Upgrade window + 10 min contingency       |
| Avg entry size  | **1 kB**             | Typical BSON size of collection docs      |
| Safety factor   | **1.5**              | Gives headroom for burst writes           |

```
Raw volume = 15000 × 1800 × 1024 ≈ 27648000kB ≈ 27GB
With safety = 27GB × 1.5 ≈ 41GB
```

* Recommendation: Round up to 48 GB when running replSetResizeOplog or the --oplogSize init option.
* A 48 GB oplog provides \~35 min at *double* the recorded peak, so the window remains safe even during black‑swan spikes.

</details>

2. Monitor key metrics

* Use `rs.printReplicationInfo()` to retrieve information on the oplog status, including the size of the oplog and the time range of operations.

3. Avoid long pauses

* Schedule connector downtime within the calculated oplog window.

4. Recommended Nilus source options

<details>

<summary>Sample Configuration</summary>

```yaml
source:
  address: dataos://mongodept
  options:
    strategy: flatten
  cdc:
    collection.include.list: "spam.product"
    topic.prefix: "cdc_changelog"
    snapshot.mode: "when_needed"
    max.batch.size: 250
    max.queue.size: 2000
    max.queue.size.in.bytes: "134217728"
    heartbeat.interval.ms: 6000
    offset.flush.interval.ms: 15000

sink:
  address: dataos://testawslh
  options:
    dest_table: mongodb_test
    incremental_strategy: append
    aws_region: us-west-2
```

</details>

| Property                                                       | Why it helps                                                                                      | Suggested value                                                                                                                                                                                                                                                                                                                           |
| -------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `snapshot.mode`                                                | Controls what Nilus does when offsets are missing.                                                | <p>• Set to <code>initial</code> (default) or <code>always</code> if you anticipate long downtimes<br>• <code>when\_needed</code> • After the connector starts, it performs a snapshot only if either It cannot detect any topic offsets or a previously recorded offset specifies a log position that is not available on the server</p> |
| `offset.flush.interval.ms`                                     | How often offsets are committed. Shorter intervals reduce duplicate events after crashes.         | 15000 ms                                                                                                                                                                                                                                                                                                                                  |
| `heartbeat.interval.ms`                                        | Emits heartbeat records to keep offsets moving even when no data changes. Helps detect lag early. | 5000–10000 ms                                                                                                                                                                                                                                                                                                                             |
| `max.batch.size`, `max.queue.size` & `max.queue.size.in.bytes` | Tune to keep connector processing speed > peak write rate, avoiding backlog.                      | Start with small (eg. 250 / 2000 & 128 MB), adjust according to the data volume and change frequency                                                                                                                                                                                                                                      |

## Operational playbook

| Phase                      | Checklist                                                                                     |
| -------------------------- | --------------------------------------------------------------------------------------------- |
| **Daily**                  | <p>- Monitor <code>oplog window</code> & connector lag<br>- Alert if lag > 80 % of window</p> |
| **Before maintenance**     | - Calculate expected pause; if > window, increase oplog temporarily                           |
| **After unplanned outage** | - If connector fails with 286, decide between *re‑snapshot or clean the PV directory*         |
| **After success**          | - Review sizing assumptions; adjust `oplogSizeMB` or Nilus throughput limits                  |

## Useful MongoDB shell commands

```
// Check how many hours of history are currently in the oplog
rs.printReplicationInfo();

// Show the newest record in the oplog
use local;
db.oplog.rs.find().sort({$natural:-1}).limit(1).pretty();

// Resize oplog (requires primary)
use admin;
db.adminCommand({replSetResizeOplog:1, size: <MB>, minRetentionHours: <hours>});

```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/troubleshooting/mongodb-cdc-error-286.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
