> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/build/stage-1-discover/bring-data-in/batch-ingestion.md).

# Batch data movement

Refer to this page when you need to move data on a schedule or as a one-time run. Nilus batch movement extracts data from a source, normalizes it, and loads it into the underlying engine in chunks.

***

## When to use batch

Use batch movement when:

* Periodic updates (hourly, daily, weekly) are acceptable for the use case.
* You need a full reload or an incremental extraction driven by a timestamp or cursor field.
* The source is a SaaS API, analytics warehouse, object storage, or file-based system.

***

## Before you start

Make sure you have:

* Access to the DataOS tenant where the pipeline will run.
* A DataOS Depot for the source or an approved connection method with secrets projected via `use.projection`.
* A DataOS Depot or connection for the destination.
* A compute profile available for the pipeline.
* Read access to the source table or collection.
* Write access to the destination table or schema.

***

{% stepper %}
{% step %}

## Choose your source

Nilus batch data movement supports 21 sources: databases, warehouses, SaaS platforms, and message queues.

| Source                 | Source           |
| ---------------------- | ---------------- |
| Apache Kafka           | AWS Athena       |
| AWS DataOS Lakehouse   | AWS Redshift     |
| Azure DataOS Lakehouse | Azure Synapse    |
| Databricks             | Delta Lake       |
| GCS DataOS Lakehouse   | Google Analytics |
| Google BigQuery        | Google Sheets    |
| HubSpot                | MongoDB          |
| MySQL                  | NATS             |
| PostgreSQL             | Salesforce       |
| Snowflake              | Stripe           |
| Custom Source          |                  |

For per-connector prerequisites and YAML examples, see [Supported batch sources](/concepts/resources/nilus/batch/batch-sources.md).
{% endstep %}

{% step %}

## Write the manifest

Create a `nilus` resource manifest with `spec.type: batch`.

```yaml
name: ${{pipeline-name}}
version: v1alpha
type: nilus
tags:
  - nilus-batch
description: ${{description}}

spec:
  type: batch
  compute: ${{compute-profile}}
  logLevel: INFO

  resources:
    requests:
      cpu: "200m"
      memory: "256Mi"

  schedule:                        # optional; omit for a one-time run
    crons:
      - "0 2 * * *"               # e.g. daily at 02:00 UTC
    timezone: UTC
    concurrencyPolicy: Forbid

  source:
    address: dataos://${{source-depot}}?purpose=ro
    options:
      source_table: "${{schema.table}}"
      incremental_key: updated_at  # optional; use for delta loads

  sink:
    address: dataos://${{sink-depot}}?purpose=rw
    options:
      dest_table: ${{schema.table}}
      incremental_strategy: replace  # append | replace | merge
```

**Sink strategies**

| Strategy  | When to use                                                                    |
| --------- | ------------------------------------------------------------------------------ |
| `append`  | Add new rows without touching existing rows.                                   |
| `replace` | Drop and reload the destination table on every run.                            |
| `merge`   | Upsert rows using `primary_key`; requires `primary_key` set in source options. |

For the full attribute reference, see the [respective Batch configurations](/concepts/resources/nilus/batch/pipeline-config.md).
{% endstep %}

{% step %}

## Apply the pipeline

```bash
dataos-ctl resource apply -f ${{path-to-manifest.yaml}}
```

Confirm the resource is active:

```bash
dataos-ctl resource get -t nilus -a
# or
dataos-ctl resource get -t nilus -n ${{pipeline-name}}
```

{% endstep %}
{% endstepper %}

***

**Related reference**

* [Supported batch sources](/concepts/resources/nilus/batch/batch-sources.md): per-connector prerequisites, options, and YAML examples.
* [Supported destinations](/concepts/resources/nilus/destinations.md): configure the sink for your target system.
* [Configurations reference](/concepts/resources/nilus/batch/pipeline-config.md): full batch attribute table.
* [Schema evolution](/concepts/resources/nilus/concepts/schema-evolution.md): how Nilus handles column additions and type changes.
* [Data masking](/concepts/resources/nilus/concepts/understanding-data-masking.md): mask or redact columns during movement.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/build/stage-1-discover/bring-data-in/batch-ingestion.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
