> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/destinations/dataos-lakehouse/aws-backed.md).

# AWS-backed

The AWS-backed DataOS Lakehouse destination writes [Apache Iceberg](https://iceberg.apache.org/docs/latest/)-backed datasets into a Lakehouse Depot whose storage layer is Amazon S3. Nilus uses the same Lakehouse connector for every cloud. The Lakehouse Depot selects the storage backend, not a separate connector type. AWS-backed, Azure-backed, and GCP-backed Lakehouse Depots all share the same pipeline shape, sink address pattern, and table behavior. This page covers the AWS / S3 variant. The [Azure-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/azure-backed.md) and [GCP-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/gcp-backed.md) variants are documented separately.

## Requirements

Connectivity and credentials must both be in place before the pipeline can run.

### Connectivity

* The Nilus runtime must reach the configured S3 endpoint and the Iceberg REST catalog identified by `METASTORE_URL`.
* The Lakehouse Depot must be defined in DataOS with `storageType: s3` and a `spec.s3` block, and must be reachable from the runtime cluster.
* An AWS region is required. Provide it via `aws_region` in the sink options or on the Depot resource.

### Lakehouse Depot shape

For an S3-backed Lakehouse the Depot resource carries:

```yaml
spec:
  storageType: s3
  s3:
    bucket: my-iceberg-bucket
    region: us-east-1
    relativePath: warehouse/         # optional sub-prefix
    endpoint: https://s3.us-east-1.amazonaws.com   # optional, for S3-compatible stores
```

Storage credentials projected from the Depot's secret:

| Secret key          | Required | Notes                                                    |
| ------------------- | -------- | -------------------------------------------------------- |
| `aws_access_key`    | Yes      | Maps to `AWS_ACCESS_KEY_ID` at runtime.                  |
| `aws_secret_key`    | Yes      | Maps to `AWS_SECRET_ACCESS_KEY` at runtime.              |
| `aws_session_token` | No       | Use for temporary credentials (e.g. assumed-role flows). |

### Permissions

* The Depot's IAM principal must be able to read, write, and delete objects under the configured bucket / prefix.
* The runtime must be authorized to register tables in the Iceberg REST catalog. The Lakehouse Depot secret must include an `apikey` (a DataOS API token). Nilus reads it as `LAKEHOUSE_APIKEY` and passes it to the Iceberg REST catalog as the connection token. If the secret has no `apikey`, the pipeline fails at startup with `'apikey' is required in Lakehouse depot secret.`

## Sink address

Reference the Lakehouse Depot by name; the address format is the same for every cloud backend:

```
dataos://<lakehouse-depot>
```

Authoring a `lakehouse://` URI directly in a manifest is not supported. Always go through the Depot.

## Sink options

| Option                 | Required    | Description                                                                                                                                                         |
| ---------------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `dest_table`           | Yes         | Target Iceberg table in `<schema>.<table>` form. Exactly two dot-separated parts.                                                                                   |
| `incremental_strategy` | Yes         | One of `replace`, `append`, `merge`.                                                                                                                                |
| `aws_region`           | Conditional | Required when the region is not already on the Depot or in `AWS_REGION`.                                                                                            |
| `aws_endpoint`         | No          | Optional endpoint override for S3-compatible object stores.                                                                                                         |
| `partition_by`         | No          | Iceberg partition spec (list of column / transform pairs). See [Optimize Sink Datasets](/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets.md). |

## Sample Nilus configs

Each example below is self-contained and uses the current Nilus pipeline shape.

### Batch ingestion

```yaml
name: nilus-aws-lakehouse-batch
version: v1alpha
type: nilus
spec:
  type: batch
  compute: universe-compute
  source:
    address: dataos://postgres-source
    options:
      source_table: public.orders
  sink:
    address: dataos://aws-lakehouse-depot
    options:
      dest_table: sales.orders_snapshot
      incremental_strategy: merge
      aws_region: us-east-1
```

### CDC ingestion

```yaml
name: nilus-aws-lakehouse-cdc
version: v1alpha
type: nilus
spec:
  type: cdc
  compute: universe-compute
  source:
    address: dataos://mssql-cdc-depot
    cdc:
      table.include.list: "dbo.orders"
      topic.prefix: "orders_cdc"
  sink:
    address: dataos://aws-lakehouse-depot
    options:
      dest_table: sales.orders_cdc
      incremental_strategy: append
      aws_region: us-east-1
```

## Behavior and capabilities

* **Table format**: Apache Iceberg. Nilus writes Parquet data files to S3 and registers / updates the resulting tables in the Iceberg REST catalog.
* **Object model**: Iceberg tables addressed as `<schema>.<table>`.
* **Supported pipeline modes**: `batch` and `cdc`.
* **Supported incremental strategies**: `replace`, `append`, `merge`.
* **Storage layer**: Amazon S3 or S3-compatible object stores (when `aws_endpoint` is configured on the Depot or in sink options).

## Troubleshooting

| Symptom                                                                    | Likely cause                                                                                | Resolution                                                                                                         |
| -------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| `METASTORE_URL environment variable is required for Lakehouse destination` | The Depot did not project `METASTORE_URL` into the runtime.                                 | Re-check the Lakehouse Depot configuration; verify the Iceberg REST catalog endpoint is set on the Depot resource. |
| `Unsupported scheme: <X>. Expected 'lakehouse://'`                         | The manifest authored a non-`dataos://` URI that did not resolve to a Lakehouse Depot.      | Always reference the Lakehouse via `dataos://<lakehouse-depot>`.                                                   |
| `Table name must be in the format <schema>.<table>`                        | `dest_table` is missing the schema prefix or has more than two parts.                       | Use exactly `<schema>.<table>`.                                                                                    |
| S3 write fails with `403` / `AccessDenied`                                 | Storage credentials lack write or delete permission on the configured prefix.               | Re-check the Depot secret and the IAM / role bindings on the bucket.                                               |
| `NoSuchBucket` or region mismatch                                          | The Depot's region is not set, or the Depot region disagrees with the runtime `AWS_REGION`. | Set `aws_region` in sink options or on the Depot resource.                                                         |
| Iceberg commit fails with `409 Conflict`                                   | Concurrent writers from another pipeline are competing on the same target table.            | Serialize writes per Iceberg table, or use distinct destination tables per pipeline.                               |

## Related docs

* [Azure-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/azure-backed.md)
* [GCP-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/gcp-backed.md)
* [Understanding Batch Pipeline Config](/concepts/resources/nilus/batch/pipeline-config.md)
* [Understanding CDC Pipeline Config](/concepts/resources/nilus/cdc/service-config.md)
* [Optimize Sink Datasets](/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets.md): guidance on `incremental_strategy` and dataset-shape tuning for Iceberg targets.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/destinations/dataos-lakehouse/aws-backed.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
