> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets/optimize-correctness-knobs.md).

# Correctness Knobs

*Part of* [*Optimize Sink Datasets*](/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets.md)*.*

These knobs decide *what ends up* in the destination. Get them wrong and the data is wrong.

## `incremental_strategy` (sink)

| Goes under     | Default              | Valid values                 |
| -------------- | -------------------- | ---------------------------- |
| `sink.options` | required, no default | `replace`, `append`, `merge` |

How repeated runs interact with the destination table.

| Strategy  | Use it for                                                          | Behavior                                                               |
| --------- | ------------------------------------------------------------------- | ---------------------------------------------------------------------- |
| `replace` | Small lookup tables, controlled snapshots, pre-production backfills | Drops and re-creates the destination table on every run.               |
| `append`  | Immutable / event-style data, logs, events, history, telemetry      | Adds new rows; never updates or deletes.                               |
| `merge`   | Mutable entity tables, customers, subscriptions, orders, accounts   | Upserts on `primary_key`. Requires a stable primary key in the source. |

Some destinations restrict the supported set. See [Destinations](/concepts/resources/nilus/destinations.md) for per-destination strategy support. The most common surprises:

* **MongoDB destination**: supports `replace` and `merge` only, `append` is explicitly rejected.
* **Elasticsearch destination**: supports `replace` only, drop-and-recreate-index semantics.
* **AWS S3 file-system destination**: supports `replace` and `append`. `replace` requires `s3:DeleteObject` IAM permission.

### Example, `append`

```yaml
spec:
  type: batch
  source:
    address: postgresql://user:pass@host:5432/app
    options:
      source_table: public.events
      incremental_key: event_time
  sink:
    address: dataos://lakehouse
    options:
      dest_table: raw.events
      incremental_strategy: append
```

### Example, `merge`

```yaml
spec:
  type: batch
  source:
    address: postgresql://user:pass@host:5432/app
    options:
      source_table: public.customers
      incremental_key: updated_at
      primary_key: id
  sink:
    address: postgresql://user:pass@host:5432/warehouse
    options:
      dest_table: raw.customers
      incremental_strategy: merge
```

### Example, `replace`

```yaml
spec:
  type: batch
  source:
    address: postgresql://user:pass@host:5432/app
    options:
      source_table: public.country_codes
  sink:
    address: dataos://lakehouse
    options:
      dest_table: raw.country_codes
      incremental_strategy: replace
```

## `incremental_key` (source)

| Goes under       | Default | Type                 |
| ---------------- | ------- | -------------------- |
| `source.options` | unset   | string (column name) |

Tells Nilus which column represents "newness" so each run can extract only the rows that have appeared or changed since the last successful run.

* Prefer `updated_at` for mutable tables.
* Prefer `created_at` or event time for append-only datasets.
* The column must be **monotonically increasing** in the source. Rows mutated without their `updated_at` being touched will be missed.
* The column should be indexed in the source database. Un-indexed scans are the most common cause of slow incremental runs.
* Recommended type: `timestamp`, `datetime`, `date`, or a sequence-backed integer.

## `primary_key` (source)

| Goes under       | Default                      | Type                 |
| ---------------- | ---------------------------- | -------------------- |
| `source.options` | unset (required for `merge`) | string (column name) |

Identifies the logical row. Used by `incremental_strategy: merge` to decide whether an inbound row is an insert or an update.

* Start with one stable business key, typically `id`.
* Avoid composite keys unless the source model genuinely requires it.
* If the source has no reliable row identifier, do not use `merge`.

## `type_hints` (source)

| Goes under       | Default  | Type                               |
| ---------------- | -------- | ---------------------------------- |
| `source.options` | inferred | object map (`column_name: <type>`) |

Overrides Nilus' automatic type inference when the source's declared type is ambiguous or operationally important. Each entry is `column_name: <type>`. Supported types:

`text`, `bigint`, `bool`, `timestamp`, `date`, `decimal`, `double`, `binary`, `json`, `time`.

```yaml
spec:
  source:
    address: postgresql://user:pass@host:5432/app
    options:
      source_table: public.users
      incremental_key: updated_at
      type_hints:
        created_at: timestamp
        updated_at: timestamp
        signup_date: date
        amount: decimal
```

Hint **only** ambiguous columns. Hinting columns the source already declares correctly adds no value and increases the chance of a future schema-drift mismatch.

## `interval_start` / `interval_end` (source)

| Goes under       | Default | Type            |
| ---------------- | ------- | --------------- |
| `source.options` | unset   | ISO-8601 string |

Bounded extraction window. Both bounds are optional and independent. Set one or both. Useful for one-off backfills, replays, or windowed catch-up runs.

```yaml
spec:
  source:
    address: postgresql://user:pass@host:5432/app
    options:
      source_table: public.orders
      incremental_key: updated_at
      interval_start: "2024-01-01T00:00:00Z"
      interval_end:   "2024-04-01T00:00:00Z"
```

When set in addition to `incremental_key`, the incremental cursor is constrained to `[interval_start, interval_end]`. Without `incremental_key`, these bounds have no effect.

## `mask` (source)

| Goes under       | Default | Type                                          |
| ---------------- | ------- | --------------------------------------------- |
| `source.options` | unset   | object map (`column_name: algorithm[:param]`) |

Column-level data masking applied during extraction. Use it to hash or redact PII before the data hits the sink.

```yaml
spec:
  source:
    address: postgresql://user:pass@host:5432/app
    options:
      source_table: public.users
      mask:
        email: hash
        phone: "partial:3"
        ssn: redact
```

Supported algorithms include `hash`, `partial:<n>`, and `redact`. The full list is enumerated in the per-connector pages where masking has connector-specific implications.

## `full_refresh` (sink)

| Goes under     | Default | Type    |
| -------------- | ------- | ------- |
| `sink.options` | `false` | boolean |

Resets the **internal pipeline state** so the next run re-extracts the entire history rather than continuing from the last cursor.

`full_refresh` is **not** the same as `incremental_strategy: replace`:

* `incremental_strategy: replace` writes the new run's output by truncating-and-reloading the destination, but the cursor state is preserved. Subsequent `replace` runs still respect `incremental_key`-driven extraction. Useful for "I want a clean snapshot of the latest source state every run."
* `full_refresh: true` drops the cursor state, so the next run re-extracts from the beginning regardless of `incremental_key`. Useful for "I broke my state, please re-extract everything."

Use `full_refresh: true` for rare, deliberate one-off runs (state reset after a schema break, after an upstream backfill that mutated old rows, after a restore-from-backup of the source). Set it back to `false` (or remove it) for steady-state runs. Leaving it `true` makes every scheduled run a full re-extract.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets/optimize-correctness-knobs.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
