> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/destinations/dataos-lakehouse/azure-backed.md).

# Azure-backed

The Azure-backed DataOS Lakehouse destination writes [Apache Iceberg](https://iceberg.apache.org/docs/latest/)-backed datasets into a Lakehouse Depot whose storage layer is Azure Data Lake Storage Gen2 (ADLS Gen2 / ABFSS). Nilus uses the same Lakehouse connector for every cloud. The Lakehouse Depot selects the storage backend, not a separate connector type. AWS-backed, Azure-backed, and GCP-backed Lakehouse Depots all share the same pipeline shape, sink address pattern, and table behavior. This page covers the Azure / ABFSS variant. The [AWS-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/aws-backed.md) and [GCP-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/gcp-backed.md) variants are documented separately.

> Legacy WASBS-backed Azure Depots are still supported by the runtime, but re-model them as ABFSS before production rollout. The depot spec on this page uses ABFSS. The credential shape is identical for both.

## Requirements

Connectivity and credentials must both be in place before the pipeline can run.

### Connectivity

* The Nilus runtime must reach the configured ADLS Gen2 endpoint and the Iceberg REST catalog identified by `METASTORE_URL`.
* The Lakehouse Depot must be defined in DataOS, with `storageType: abfss` (or `wasbs` for legacy depots) and a `spec.abfss` (or `spec.wasbs`) block, and must be reachable from the runtime cluster.

### Lakehouse Depot shape

For an Azure-backed Lakehouse the Depot resource carries:

```yaml
spec:
  storageType: abfss
  abfss:
    container: my-iceberg-container
    account: myiceberglake
    relativePath: warehouse/         # optional sub-prefix
```

The runtime resolves the bucket URL as `abfss://<container>@<account>.dfs.<endpointSuffix>/<relativePath>`. The endpoint suffix defaults to `core.windows.net` and can be overridden through the Depot secret.

Storage credentials projected from the Depot's secret:

| Secret key           | Required | Notes                                                                                          |
| -------------------- | -------- | ---------------------------------------------------------------------------------------------- |
| `az_account_name`    | Yes      | The ADLS Gen2 storage account name.                                                            |
| `az_account_key`     | Yes      | Account key for the storage account.                                                           |
| `az_endpoint_suffix` | No       | Defaults to `core.windows.net`. Override for sovereign clouds (e.g. `core.usgovcloudapi.net`). |

### Permissions

* The Depot's storage credentials must be able to read, write, and delete blobs under the configured container / prefix (typically the `Storage Blob Data Contributor` role on the container).
* The runtime must be authorized to register tables in the Iceberg REST catalog. The Lakehouse Depot secret must include an `apikey` (a DataOS API token). Nilus reads it as `LAKEHOUSE_APIKEY` and passes it to the Iceberg REST catalog as the connection token. If the secret has no `apikey`, the pipeline fails at startup with `'apikey' is required in Lakehouse depot secret.`

## Sink address

Reference the Lakehouse Depot by name; the address format is the same for every cloud backend:

```
dataos://<lakehouse-depot>
```

Authoring a `lakehouse://` URI directly in a manifest is not supported. Always go through the Depot.

## Sink options

| Option                 | Required | Description                                                                                                                                                         |
| ---------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `dest_table`           | Yes      | Target Iceberg table in `<schema>.<table>` form. Exactly two dot-separated parts.                                                                                   |
| `incremental_strategy` | Yes      | One of `replace`, `append`, `merge`.                                                                                                                                |
| `partition_by`         | No       | Iceberg partition spec (list of column / transform pairs). See [Optimize Sink Datasets](/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets.md). |

> Azure-backed Lakehouses do not take cloud-specific sink options. Storage account, container, and credentials all come from the Depot.

## Sample Nilus configs

Each example below is self-contained and uses the current Nilus pipeline shape.

### Batch ingestion

```yaml
name: nilus-azure-lakehouse-batch
version: v1alpha
type: nilus
spec:
  type: batch
  compute: universe-compute
  source:
    address: dataos://postgres-source
    options:
      source_table: public.orders
  sink:
    address: dataos://azure-lakehouse-depot
    options:
      dest_table: sales.orders_snapshot
      incremental_strategy: merge
```

### CDC ingestion

```yaml
name: nilus-azure-lakehouse-cdc
version: v1alpha
type: nilus
spec:
  type: cdc
  compute: universe-compute
  source:
    address: dataos://mssql-cdc-depot
    cdc:
      table.include.list: "dbo.orders"
      topic.prefix: "orders_cdc"
  sink:
    address: dataos://azure-lakehouse-depot
    options:
      dest_table: sales.orders_cdc
      incremental_strategy: append
```

## Behavior and capabilities

* **Table format**: Apache Iceberg. Nilus writes Parquet data files to ADLS Gen2 and registers / updates the resulting tables in the Iceberg REST catalog.
* **Object model**: Iceberg tables addressed as `<schema>.<table>`.
* **Supported pipeline modes**: `batch` and `cdc`.
* **Supported incremental strategies**: `replace`, `append`, `merge`.
* **Storage layer**: Azure Data Lake Storage Gen2 (ABFSS). Legacy WASBS-backed Depots are still accepted, but re-model them as ABFSS for new deployments.

## Troubleshooting

| Symptom                                                                    | Likely cause                                                                                | Resolution                                                                                                         |
| -------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| `METASTORE_URL environment variable is required for Lakehouse destination` | The Depot did not project `METASTORE_URL` into the runtime.                                 | Re-check the Lakehouse Depot configuration; verify the Iceberg REST catalog endpoint is set on the Depot resource. |
| `Unsupported scheme: <X>. Expected 'lakehouse://'`                         | The manifest authored a non-`dataos://` URI that did not resolve to a Lakehouse Depot.      | Always reference the Lakehouse via `dataos://<lakehouse-depot>`.                                                   |
| `Table name must be in the format <schema>.<table>`                        | `dest_table` is missing the schema prefix or has more than two parts.                       | Use exactly `<schema>.<table>`.                                                                                    |
| ADLS write fails with `403` / `AuthorizationFailure`                       | The Depot's account key lacks Blob Data Contributor permission on the container.            | Re-check the Depot secret and the role binding on the storage account / container.                                 |
| ADLS resolves to wrong endpoint (sovereign cloud)                          | The Depot did not set `az_endpoint_suffix` and the runtime defaulted to `core.windows.net`. | Set the appropriate sovereign-cloud suffix on the Depot secret.                                                    |
| Iceberg commit fails with `409 Conflict`                                   | Concurrent writers from another pipeline are competing on the same target table.            | Serialize writes per Iceberg table, or use distinct destination tables per pipeline.                               |

## Related docs

* [AWS-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/aws-backed.md)
* [GCP-backed DataOS Lakehouse](/concepts/resources/nilus/destinations/dataos-lakehouse/gcp-backed.md)
* [Understanding Batch Pipeline Config](/concepts/resources/nilus/batch/pipeline-config.md)
* [Understanding CDC Pipeline Config](/concepts/resources/nilus/cdc/service-config.md)
* [Optimize Sink Datasets](/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets.md): guidance on `incremental_strategy` and dataset-shape tuning for Iceberg targets.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/destinations/dataos-lakehouse/azure-backed.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
