> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/destinations/cloud-warehouses/databricks.md).

# Databricks

[Databricks](https://docs.databricks.com/) is a unified platform for data engineering, analytics, and AI. Nilus writes into Databricks Unity Catalog tables for both batch and CDC pipelines through a Databricks SQL Warehouse. All tables created or loaded through this connector are **Delta tables** managed under the specified `catalog.schema`.

This page covers Databricks **as a destination**. Databricks can also be used as a batch source, see [Databricks](/concepts/resources/nilus/batch/batch-sources/databricks.md).

## Requirements

Connectivity and credentials must both be in place before the pipeline can run.

### Connectivity

* The Nilus runtime must reach the Databricks workspace hostname (`<workspace>.cloud.databricks.com`) on TCP `443`.
* The configured SQL Warehouse must be `Running`. Auto-stopped warehouses cold-start on first connection but pipelines may time out during the warm-up if the warehouse takes too long to come up.
* For workspaces behind PrivateLink / IP allow-lists, verify the runtime egress IP is allowlisted on the workspace network policy.

### Required parameters

| Parameter         | Required    | Default | Description                                                                                                                           |
| ----------------- | ----------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| `server_hostname` | Yes         | -       | Databricks workspace host, e.g. `dbc-123abc-def.cloud.databricks.com`. Encoded as the URI authority.                                  |
| `http_path`       | Yes         | -       | SQL Warehouse HTTP path, e.g. `/sql/1.0/warehouses/abc123def456`.                                                                     |
| `catalog`         | Yes         | -       | Unity Catalog name.                                                                                                                   |
| `schema`          | Yes         | -       | Unity Catalog schema. Required by Nilus' destination routing, keep it in the URI.                                                     |
| `access_token`    | Conditional | -       | Databricks Personal Access Token, encoded as the URI password (`databricks://token:<pat>@…`). Required when not using OAuth M2M.      |
| `client_id`       | Conditional | -       | OAuth M2M service-principal client ID. Use with `client_secret` instead of `access_token`.                                            |
| `client_secret`   | Conditional | -       | OAuth M2M service-principal secret. Used with `client_id` to fetch a short-lived access token from `/oidc/v1/token` on the workspace. |

### Permissions

The token holder (PAT user or OAuth service principal) needs:

* `Can use` on the SQL Warehouse referenced by `http_path`.
* `USE CATALOG` on the catalog and `USE SCHEMA` on the schema.
* `CREATE TABLE` on the schema (only required if Nilus is expected to auto-create the destination Delta table).
* `MODIFY` on the destination tables.

### URI format

PAT authentication:

```
databricks://token:<access_token>@<server_hostname>?http_path=<http_path>&catalog=<catalog>&schema=<schema>
```

OAuth M2M (service-principal) authentication:

```
databricks://<server_hostname>?client_id=<sp_client_id>&client_secret=<sp_client_secret>&http_path=<http_path>&catalog=<catalog>&schema=<schema>
```

> **Note** When `client_id` and `client_secret` are both present, Nilus exchanges them for a short-lived access token at `https://<server_hostname>/oidc/v1/token` before opening the SQL connection. The PAT in the URI password is ignored on this code path.

## Sink options

| Option                 | Required | Description                                                                                                                         |
| ---------------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| `dest_table`           | Yes      | Either a bare table name (uses the URI's `schema`) or `<schema>.<table>` (overrides the URI's `schema`).                            |
| `incremental_strategy` | Yes      | One of `append`, `replace`, `merge`. CDC sinks should use `merge` (current state) or `append` (event log); avoid `replace` for CDC. |

## Sample Nilus configs

Each example below is self-contained and uses the current Nilus pipeline shape.

### Batch ingestion (PAT)

```yaml
name: salesforce-to-databricks
version: v1alpha
type: nilus
spec:
  type: batch
  compute: universe-compute
  source:
    address: salesforce://?username={SALESFORCE_USERNAME}&password={SALESFORCE_PASSWORD}&token={SALESFORCE_TOKEN}&domain={SALESFORCE_DOMAIN}
    options:
      source_table: account
  sink:
    address: databricks://token:{DATABRICKS_TOKEN}@{DATABRICKS_HOST}?http_path={DATABRICKS_HTTP_PATH}&catalog=main&schema=analytics
    options:
      dest_table: account
      incremental_strategy: append
```

### CDC ingestion (OAuth M2M)

```yaml
name: postgres-orders-cdc
version: v1alpha
type: nilus
spec:
  type: cdc
  compute: universe-compute
  source:
    address: dataos://postgres-cdc?purpose=ro
    options:
      strategy: flatten
    cdc:
      table.include.list: "public.orders"
      topic.prefix: "orders_cdc"
  sink:
    address: databricks://{DATABRICKS_HOST}?client_id={DATABRICKS_SP_CLIENT_ID}&client_secret={DATABRICKS_SP_CLIENT_SECRET}&http_path={DATABRICKS_HTTP_PATH}&catalog=main&schema=analytics
    options:
      dest_table: orders_cdc
      incremental_strategy: merge
```

## Behavior and capabilities

* **Compute model**: the Databricks SQL Warehouse identified by `http_path` executes all DDL / DML. Nilus connects as a SQL client.
* **Object model**: Unity Catalog Delta tables. The destination is fully described by `<catalog>.<schema>.<table>`.
* **Supported pipeline modes**: `batch` and `cdc`.
* **`dest_table` resolution**: if `dest_table` contains a dot it overrides the URI's `schema`; if it is a bare table name, Nilus joins it with the `schema` from the URI. The destination raises `Table name must be in the format <schema>.<table>, or specify schema in the URI` if neither is set.
* **`merge` requirements**: define a `primary_key` on the dataset in the manifest so Databricks can `MERGE INTO` the target table cleanly.

## Troubleshooting

| Symptom                                                                           | Likely cause                                                                        | Resolution                                                                                                       |
| --------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| `Databricks URI must include a server hostname`                                   | The URI authority is empty.                                                         | Encode the hostname into the URI authority (`databricks://token:<pat>@<host>...`).                               |
| `Databricks URI must include an access token or client_id/client_secret`          | Neither PAT nor OAuth credentials supplied.                                         | Use either the PAT form (`token:<pat>@`) or the OAuth M2M form (`client_id` + `client_secret`).                  |
| `Failed to obtain Databricks OAuth token: HTTP 401`                               | OAuth client ID / secret mismatch, or the service principal lacks workspace access. | Rotate the secret, confirm the SP is added to the workspace, and confirm the SP has access to the SQL Warehouse. |
| `Invalid HTTP Path` / "Endpoint is not running"                                   | SQL Warehouse stopped, deleted, or the `http_path` has the wrong warehouse ID.      | Restart the warehouse and confirm `http_path` matches the warehouse's "JDBC URL" copy from the Databricks UI.    |
| `Table name must be in the format <schema>.<table>, or specify schema in the URI` | `dest_table` is a bare table name and the URI has no `schema` query parameter.      | Either add `schema=<schema>` to the URI or supply `<schema>.<table>` in `dest_table`.                            |
| `merge` runs duplicate rows                                                       | Manifest is missing a primary key.                                                  | Add a `primary_key` to the dataset spec; Databricks needs it to drive the `MERGE` join.                          |

## Related docs

* [Databricks](/concepts/resources/nilus/batch/batch-sources/databricks.md): companion batch source connector.
* [Optimize Sink Datasets](/concepts/resources/nilus/pipeline-optimization/optimize-sink-datasets.md): guidance on `incremental_strategy` and dataset-shape tuning for Delta tables.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/destinations/cloud-warehouses/databricks.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
