> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/troubleshooting/mongodb-cdc-bufferingchangestreamcursor.md).

# MongoDB: BufferingChangeStreamCursor Warning

This article explains the `BufferingChangeStreamCursor` warning, what it means for your CDC pipeline, and how to resolve it.

## TL;DR

* This warning indicates that the MongoDB CDC pipeline temporarily stopped reading new changes because its internal buffer filled up.
* It is not an error by itself, but if it continues for long periods, it can cause CDC lag, resume-token failures, or change stream interruptions.

### To prevent the issue

* Reduce CDC batch size.
* Increase service memory and set memory limits (if not set already).
* Make sure sink writes are healthy (check source event lag and sink commit behavior).
* Avoid long-running Iceberg commits or high write spikes.

### If it occurs

* Verify whether CDC is still progressing.
* Check sink performance.
* Restart the CDC service only if offsets stop advancing.
* Take action before the MongoDB oplog window is exceeded.

## Overview

```
Unable to acquire buffer lock, buffer queue is likely full
```

This warning comes from the Nilus MongoDB CDC reader. It means the runtime's in-memory buffer is full and cannot accept more events until the downstream consumer catches up.

### Key points

* The warning does not mean data loss.
* It can indicate backpressure in the pipeline.
* If backpressure persists long enough, it can eventually lead to MongoDB change stream errors, such as:
* `ChangeStreamHistoryLost`
* `InvalidResumeToken`

This document explains why it happens, how to mitigate it proactively, and what to do if it occurs.

## Root cause

The warning appears when the event-production rate (Mongo writes) temporarily exceeds the event-consumption rate (Nilus processing + Iceberg writes). The typical contributing factors:

1. Downstream sink is slow (most common)

Nilus writes the CDC data to a destination. If these operations slow down, for example, due to:

* Large Iceberg commits
* Compaction or manifest rewrites
* S3 throttling or retry storms
* High latency writes

Then the Nilus CDC runtime cannot drain its queue fast enough. *Result -> buffer fills -> warning appears.*

2. Large CDC batch size

`max.batch.size: 2048` (default)

* May cause the CDC runtime to process very large chunks at once.
* Large batches increase processing time and memory consumption, slowing down queue draining.

3. Insufficient memory

This can lead to:

* Very small heap → too many minor GCs
* Very large heap → long GC pauses
* GC pauses → slow consumer thread → queue full

4. No Kubernetes memory limits

If container memory limits are not set:

* The JVM may assume it has access to full node memory.
* It may pick an inappropriate heap size automatically.
* During heavy load, this causes GC pressure and stalls.

5. CPU contention

Heavy pipeline activity plus large batches may saturate CPU in a 1-replica setup.

6. High write bursts from MongoDB

During traffic spikes, the change stream volume can exceed regular processing capacity.

## Prevention

The following configuration adjustments significantly reduce the likelihood of buffer-full conditions.

1. Reduce CDC batch size

Smaller batches mean faster downstream commits and steady buffer drain.

<details>

<summary>Example</summary>

```
max.batch.size: 1024
# or
max.batch.size: 512
```

</details>

2. Allocate sufficient (or more) memory to the service

This stops the service from operating at the edge of its memory budget.

3. Define resource memory limits

Why this matters:

* Predictable heap sizing
* Reduced GC stalls
* Improved CDC throughput

## Resolution

Use the following checklist to assess whether the warning is transient or serious. Steps

1. Verify whether CDC is still progressing

* Check the sink dataset:
* Are new rows appearing?
* Is the CDC timestamp (`_ts` or equivalent) moving forward?
* Check offset logs:
* If offsets are updating → pipeline is healthy, warning was transient.
* Check if heartbeats are processing:
* Are new heartbeats committed to the heartbeat dataset?
* Latest heartbeat was commit timestamp; how far is it since the current timestamp?

2. Measure CDC lag

* Compare the last ingested timestamp vs the MongoDB server time.

<details>

<summary>How to check MongoDB Server Time</summary>

Replica set members rely on **synchronized clocks** for:

* Oplog timestamp ordering
* Heartbeat timeouts and election timing
* Write concern “majority” acknowledgment

**Run this in your MongoDB shell:**

```
db.adminCommand({ hello: 1 })
```

\*\*Check the time on all Replica Set Members\*\*

```
#Consolidated Check
rs.status().members.forEach(m => {
  print(m.name);
  printjson(db.getSiblingDB("admin").getMongo().getDB("admin").runCommand({ hello: 1 }));
});

--OR-- 
#Individual Checks

mongosh --host mongo1:27017
db.adminCommand({ hello: 1 })

mongosh --host mongo2:27017
db.adminCommand({ hello: 1 })

mongosh --host mongo3:27017
db.adminCommand({ hello: 1 })

```

</details>

3. Check sink performance

* Query the Nilus metadata table (stored in PostgreSQL) and check for the throughput.

<details>

<summary>Query</summary>

```sql
SELECT 
    li.*, 
    ri.dataos_resource_id,
    ri.total_records,

    -- Extract tag from dataos_resource_id
    regexp_extract(ri.dataos_resource_id, 'workflow:v1:wf-([^-]+)-', 1) AS tag,

    -- Calculate MB/sec
    CASE 
        WHEN ri.duration_sec > 0 THEN li.files_size_mb / li.duration_sec 
        ELSE NULL 
    END AS mb_per_sec,

    -- Calculate records/sec
    CASE 
        WHEN ri.duration_sec > 0 THEN ri.total_records / li.duration_sec 
        ELSE NULL 
    END AS events_per_sec

FROM "nilusdb"."public".load_info li
JOIN (
    SELECT 
        id,  
        run_id,  
        load_id,  
        started_at,  
        finished_at,  
        duration_sec,    
        files_size_mb,  
        memory_mb,  
        cpu_percent,
        dataos_resource_id,
        reduce(
            map_values(CAST(records_count AS map(varchar, integer))),
            0,
            (s, x) -> s + x,
            s -> s
        ) AS total_records 
    FROM "nilusdb"."public".runs_info
    WHERE run_as_user = 'dataos-manager' # define username 
      AND dataos_resource_id LIKE 'workflow:v1:wf-%' #define your service name here
      # AND finished_at > TIMESTAMP '2025-09-17 09:38:00.000 UTC'
) ri ON li.load_id = ri.load_id AND li.run_id = ri.run_id
# WHERE ri.total_records > 1000
ORDER BY ri.started_at DESC;
```

</details>

4. Assess the stability of the warning

* If the warning lasts < 5 seconds:
* Normal temporary backpressure.
* No action needed.
* If the warning lasts \~5–30 seconds:
* Monitor closely; CDC lag may grow.
* Check sink and memory usage.
* If the warning persists > 30 seconds:
* Risk zone for `ChangeStreamHistoryLost`.
* Take the following actions.
* Restart the service.

This resets the Nilus CDC reader threads while preserving the pipeline position in the stream.

## Useful MongoDB shell commands

1. Check oplog size and window

This helps determine the available oplog window (how far the CDC runtime can fall behind).

```javascript
use local
db.oplog.rs.stats()
db.oplog.rs.find().sort({ $natural: 1 }).limit(1)
db.oplog.rs.find().sort({ $natural: -1 }).limit(1)
```

2. Estimate oplog window duration

This is crucial: if the CDC lag exceeds the oplog window, the change stream will fail.

```javascript
var first = db.oplog.rs.find().sort({$natural: 1}).limit(1)[0].ts.getTime();
var last  = db.oplog.rs.find().sort({$natural: -1}).limit(1)[0].ts.getTime();
print((last - first)/1000/60 + " minutes");
```

3. Check recent write rate

```javascript
db.serverStatus().opcounters
```

4. Check collection-level throughput

```javascript
db.<collection>.stats()
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/troubleshooting/mongodb-cdc-bufferingchangestreamcursor.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
