> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets/optimize-sampling-knobs.md).

# Sampling Knobs

*Part of* [*Optimize Sink Datasets*](/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets.md)*.*

These knobs cap or shape the extraction side for testing, validation, partial backfills, and Exploratory Data Analysis (EDA). They are **not** production knobs. Leave them unset on steady-state schedules.

## `sql_limit` (source)

| Goes under       | Default | Type    |
| ---------------- | ------- | ------- |
| `source.options` | unset   | integer |

Caps the total rows extracted per run. Applied as a `LIMIT` clause on SQL sources. Use for validation runs and quick sanity checks. Common range: `1000` to `100000`.

## `yield_limit` (source)

| Goes under       | Default | Type    |
| ---------------- | ------- | ------- |
| `source.options` | unset   | integer |

Caps the number of pages yielded by the source. Page-level analogue of `sql_limit`. Works for non-SQL sources too. If `page_size: 50000` and `yield_limit: 4`, the source yields at most 200,000 rows.

## `sql_exclude_columns` (source)

| Goes under       | Default | Type                   |
| ---------------- | ------- | ---------------------- |
| `source.options` | unset   | comma-separated string |

Columns to drop during extraction. Useful for excluding large blob/debug columns that are not needed downstream.

```yaml
spec:
  source:
    address: postgresql://user:pass@host:5432/app
    options:
      source_table: public.tickets
      sql_exclude_columns: "debug_payload,raw_html"
```

## `sql_reflection_level` (source)

| Goes under       | Default | Valid values      |
| ---------------- | ------- | ----------------- |
| `source.options` | `full`  | `full`, `limited` |

Controls how thoroughly Nilus reflects the source schema before extraction. `full` (the default) inspects every column, type, and constraint. `limited` reduces reflection depth on very wide source tables (1000+ columns), which speeds up extraction startup at the cost of less accurate type inference. This is typically fine when `type_hints` is set explicitly for the columns that matter.

Do not set `sql_reflection_level: limited` and leave `type_hints` unset on production tables with non-obvious types (datetimes, UUIDs, decimals). Type inference gaps will surface as downstream cast errors.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets/optimize-sampling-knobs.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
