> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/nilus/batch/custom-sources/creating-custom-source.md).

# Building a Custom Source

Custom sources extend Nilus to upstream systems that are not yet covered by a built-in connector. The source logic lives in a Git repository, is synced into the runtime through the manifest's `repo` block, and is invoked through a `custom://` source address.

## When to use a custom source

* The upstream system is not available in the supported source catalog.
* You need source-specific authentication, pagination, or object modeling.
* You want to keep the ingestion logic in version-controlled Python code that the team owns end-to-end.

## Requirements

Repository access and a compatible runtime are required before Nilus can run a custom source pipeline.

### Repository access

* The Nilus runtime clones the Git repository declared in `spec.repo.url` (via git-sync) before the pipeline starts.
* `spec.repo.baseDir` must point to the directory that contains the custom source implementation and its `requirements.txt` (optional).
* Pin the branch, tag, or commit with `spec.repo.syncFlags` so runs are reproducible.
* If the repository is private, store the git credentials in a DataOS secret and reference it from `spec.repo.secretId`.

See [Repository sync, flags, and secrets](#repository-sync-flags-and-secrets) below for the full configuration.

### Implementation contract

| Requirement       | Description                                                                                                     |
| ----------------- | --------------------------------------------------------------------------------------------------------------- |
| Source class      | Implement a custom source class that Nilus can import from the configured repository path.                      |
| Resource output   | Yield rows or documents in a stable shape so downstream schema evolution is predictable.                        |
| Parameters        | Read connector-specific parameters from the `custom://` URI and `source.options`.                               |
| Incremental state | Persist only the cursor or checkpoint needed to resume safely when the upstream system supports incrementality. |

## Implementation model

1. Subclass `CustomSource`.
2. Expose one or more resources that yield rows or documents.
3. Parse connector parameters from the `custom://` URI and `source.options`.
4. Persist incremental state only if the source supports restart-safe resumption.

## Source options

| Option         | Required | Description                                                                                                                                          |
| -------------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| `source_table` | Yes      | Logical resource name exposed by the custom source. Use a stable value because it becomes part of the downstream dataset identity.                   |
| Custom options | No       | Source-specific parameters that your custom source reads from `source.options`. Keep names explicit and document them beside the custom source code. |

### URI format

```
custom://<SourceClassName>?<option>=<value>
```

Use the URI for connection-level or source-class parameters, and `source.options` for the resource-level options the custom source expects.

## Repository sync, flags, and secrets

Nilus loads your custom source code from Git: it clones `spec.repo` with git-sync before the pipeline runs, then imports the source class from `baseDir`. Configure it with these fields:

| Field                 | Required           | Description                                                                                                                                                                          |
| --------------------- | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `spec.repo.url`       | Yes                | Git URL of the repository that contains your custom source code.                                                                                                                     |
| `spec.repo.baseDir`   | Yes                | Path inside the repo to the directory holding the custom source `*.py` files (and `requirements.txt`). Nilus scans this directory for the source class named in the `custom://` URI. |
| `spec.repo.syncFlags` | No                 | Extra flags passed to git-sync, most importantly `--ref=` to pin a branch, tag, or commit.                                                                                           |
| `spec.repo.secretId`  | Private repos only | Reference to a DataOS secret holding the git credentials, written as `<workspace>:<secret-name>`.                                                                                    |

### Pinning a branch, tag, or commit (`syncFlags`)

Without `syncFlags`, git-sync clones the default branch. Pin an explicit ref so every run uses the same code:

```yaml
spec:
  repo:
    url: https://github.com/your-org/custom-connectors
    baseDir: connectors/custom-source
    syncFlags:
      - --ref=main           # branch
      # - --ref=v1.2.0       # tag
      # - --ref=<commit-sha> # commit
```

Entries under `syncFlags` are forwarded verbatim to git-sync, so any git-sync flag is valid. `--ref=` is the one almost every pipeline should set.

### Authenticating a private repo (`secretId`)

For a private repository, store the git credentials in a DataOS secret and reference it from `spec.repo.secretId`. Nilus projects that secret into the git-sync container, which reads exactly two keys:

| Key                | Description                                                                                    |
| ------------------ | ---------------------------------------------------------------------------------------------- |
| `GITSYNC_USERNAME` | Git username. For token auth, this is the token's username (e.g. `x-token-auth` on Bitbucket). |
| `GITSYNC_PASSWORD` | Password, personal access token, or app password.                                              |

1. Create the secret (`type: secret`, `key-value`):

```yaml
name: custom-source-repo-secret
version: v2alpha
type: secret
description: Git read credentials for custom source repo sync
spec:
  type: key-value
  data:
    GITSYNC_USERNAME: <git-username>
    GITSYNC_PASSWORD: <personal-access-token>
```

```bash
dataos-ctl resource apply -f custom-source-repo-secret.yaml
```

2. Reference it from the pipeline as `<workspace>:<secret-name>` (the workspace the secret was created in):

```yaml
spec:
  repo:
    url: https://bitbucket.org/your-org/custom-connectors
    baseDir: packages/custom-source
    syncFlags:
      - --ref=main
    secretId: public:custom-source-repo-secret
```

A public repository needs only `url` + `baseDir` (plus optional `syncFlags`); omit `secretId`.

{% hint style="info" %}
`spec.repo.secretId` only authenticates repo sync for the custom source code. It is **never** used for source or sink connectivity. Those still come from the `custom://` URI, `source.options`, depot inference, or `spec.use.projection`. See [Secrets and Projections](/concepts/resources/nilus/concepts/secrets-and-projections.md) (Pattern 3: repo secret inference).
{% endhint %}

## Sample Nilus config

```yaml
name: custom-source-pipeline
version: v1alpha
type: nilus
spec:
  type: batch
  compute: universe-compute
  repo:
    url: https://github.com/your-org/custom-connectors
    baseDir: connectors/custom-source
    syncFlags:
      - --ref=main
    # secretId: public:custom-source-repo-secret  # only for a private repo
  source:
    address: custom://MyCustomSource?account_id=acme
    options:
      source_table: orders
  sink:
    address: dataos://lakehouse?purpose=rw
    options:
      dest_table: raw.orders
      incremental_strategy: append
```

## Behavior and capabilities

* **Pipeline mode**: custom sources run as `spec.type: batch` pipelines.
* **Code loading**: Nilus uses the `repo` block to fetch the source implementation at runtime.
* **Object model**: the custom source owns the resource model it exposes through `source_table`.
* **State handling**: incremental behavior is implementation-specific. The custom source must define and maintain any cursor needed for restart-safe runs.

## Troubleshooting

| Symptom                                   | Likely cause                                                                             | Resolution                                                                                                                                                                               |
| ----------------------------------------- | ---------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Runtime cannot import the custom source   | `spec.repo.baseDir` points to the wrong folder, or the expected class/module is missing. | Verify the repository layout and class name used in the `custom://` URI.                                                                                                                 |
| Repository clone fails                    | Runtime cannot reach or authenticate to the repository.                                  | For a private repo, set `spec.repo.secretId` to a secret holding `GITSYNC_USERNAME` / `GITSYNC_PASSWORD`. See [Repository sync, flags, and secrets](#repository-sync-flags-and-secrets). |
| Pipeline runs against stale or wrong code | No ref pinned, so git-sync used the default branch.                                      | Pin the branch/tag/commit with `spec.repo.syncFlags` (e.g. `--ref=main`).                                                                                                                |
| Output schema changes unexpectedly        | The custom source yields inconsistent row/document shapes.                               | Normalize output before yielding and add tests around representative source responses.                                                                                                   |
| Pipeline restarts from the beginning      | The custom source does not persist an incremental cursor.                                | Implement checkpoint handling for sources that support incrementality.                                                                                                                   |

## Best practices

* Keep connector code small and source-specific; shared helpers can live beside the main class.
* Design the source so retries do not create duplicate side effects upstream.
* If the source supports incrementality, store only the minimum cursor needed to resume safely.
* Validate schema shape, null handling, and pagination behavior in a non-production environment before promoting.

## Related docs

* [Custom sources](/concepts/resources/nilus/batch/custom-sources.md)
* [Batch sources](/concepts/resources/nilus/batch/batch-sources.md)
* [Understanding Batch Pipeline Config](/concepts/resources/nilus/batch/pipeline-config.md)
* [Custom query (SQL sources)](/concepts/resources/nilus/batch/custom-sources/custom-query-sql-sources.md): when the system you need is JDBC-accessible, a generic SQL custom query may be the lighter-weight option.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/nilus/batch/custom-sources/creating-custom-source.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
