> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/operate/phase-3-configure-tenant-with/additional-computes.md).

# Additional Computes

During initial DataOS setup, a base Compute Resource is created and attached to the Dataplane and Tenant.

Once the Dataplane is connected and the initial Compute is available, you can provision more Compute Resources for workload needs such as:

* Data processing
* Query execution
* Machine learning workloads

Compute abstracts the underlying cloud infrastructure and exposes standardized compute pools that DataOS Resources such as Vulcan, Nilus, Minerva Clusters, Lakehouses, and Applications consume.

Create additional Compute Resources with a manifest in this format:

```yaml
name: ${COMPUTE_NAME}
version: v2alpha
type: compute
tags:
  - dataos:type:resource
  - dataos:type:tenant-resource
  - dataos:resource:compute
  - dataos:tenant:product-sandbox
spec:
  dataplane: ${DATAPLANE_ID}  # this is the name of data plane which should be available
  nodeSelector:
    dataos.io/purpose: core-kernel
  nodePool:
    labels:
      dataos.io/purpose: core-kernel
```

## Configuration attributes

Declare Compute-specific attributes inside the `spec` key:

| **Attribute**            | **Data Type**      | **Requirement** | **Description**                                                                         |
| ------------------------ | ------------------ | --------------- | --------------------------------------------------------------------------------------- |
| `spec.dataplane`         | string             | Required        | Name of the Dataplane Resource for which the Compute Resource is being created          |
| `spec.nodeSelector`\*    | map\[string]string | Required        | Node-level scheduling constraint. Pods land only on matching nodes.                     |
| `spec.aliases`           | \[]string          | Optional        | Alternative names for backward compatibility and migration scenarios.                   |
| `spec.nodePool.labels`\* | map\[string]string | Optional        | Pool-level labels used for grouping and autoscaling.                                    |
| `spec.nodePool.taints`   | \[]Taint           | Optional        | Kubernetes taints requiring workloads to explicitly tolerate scheduling on these nodes. |

#### \* Notes

**`nodeSelector`**[**¶**](https://dataos.info/resources/compute/configurations/#nodeselector)

**Description:** Use `nodeSelector` to specify key-value properties for node selection. Set the desired node labels in the Compute spec, and Kubernetes schedules the Pod only onto nodes that match every label.<br>

| **Data Type** | **Requirement** | **Default Value** | **Possible Value**                    |
| ------------- | --------------- | ----------------- | ------------------------------------- |
| mapping       | mandatory       | none              | dataos.io/purpose: runnable/query/gpu |

**Additional details:** For other custom Compute Resources, the key-value pair depends on the cloud provider.\
\
**Example usage:**

```yaml
nodeSelector:   dataos.io/purpose: runnable 
```

**`nodePool.labels`**

The `nodePool` attribute holds node-pool-specific key-value properties declared in the `nodePool` mapping.

| **Data Type** | **Requirement** | **Default Value** | **Possible Value**                    |
| ------------- | --------------- | ----------------- | ------------------------------------- |
| mapping       | mandatory       | none              | dataos.io/purpose: runnable/query/gpu |

***

{% hint style="info" %}
**Do not confuse `nodeSelector.labels` with `nodePool.labels`**\
\
`nodeSelector` is the strict node-level filter,\
`nodePool.labels` describes the pool and is used by autoscalers.
{% endhint %}

### Lifecycle: apply, inspect, delete

Use the standard DataOS CLI flow:

#### Apply

```bash
# apply
dataos-ctl resource apply -f artifacts/compute/compute.yaml
```

#### Inspect

```bash
# Show the manifest as Poros sees it
dataos-ctl resource get -t compute -n <compute-name>
```

#### Delete

```bash
dataos-ctl resource delete -f artifacts/compute/compute.yaml
# or
dataos-ctl resource delete -t compute -n <compute-name>
```

{% hint style="danger" %}
Deleting a `compute` that other workloads still reference leaves those workloads unschedulable on their next reconcile. Verify with `resource get` first and migrate consumers before deleting.
{% endhint %}

<details>

<summary>Full manifest</summary>

```yaml
name: compute-analytics                      # required, identifier used by workloads
version: v2alpha                             # required
type: compute                                # required
description: "High-memory compute for analytics workloads"  # optional metadata
tags:                                        # optional metadata
  - compute
  - analytics
owner: infrastructure-team                   # optional metadata
spec:
  dataplane: dp-prod-01                      # REQUIRED: must reference an existing dataplane
  aliases:                                   # OPTIONAL: alternative names other workloads can use
    - analytics-compute
    - big-memory-compute
  nodeSelector:                              # REQUIRED: must be a non-empty map
    dataos.io/purpose: analytics
    workload-type: memory-intensive
  nodePool:                                  # OPTIONAL: node-pool level configuration
    labels:                                  # OPTIONAL: used as nodeSelector at pool level
      dataos.io/purpose: analytics
    taints:                                  # OPTIONAL: tolerated Kubernetes taints
      - key: dedicated
        value: analytics
        effect: NoSchedule
```

</details>

### How DataOS Resources consume a `compute`

A workload like `app`, `minerva`, or `vulcan` references a `compute` by name to set the infrastructure environment where it runs.

```yaml
# from artifacts/apps/app.yaml  (dynamic resource type "app")
type: app
spec:
  compute: fipsdev-compute
```

***

### Best practices

1. **Name by intent, not by hardware**: Prefer `analytics-compute`, `ml-gpu-compute`, `core-kernel-compute` over `m5-xlarge-compute`. Hardware changes; intent doesn't.
2. **Pair each compute with a clear node-pool taint** when it targets premium hardware (GPU, high-memory). The taint stops cheap workloads from stealing the node.
3. **Use `aliases`** when renaming a compute so existing workloads keep working until you migrate them.
4. **Keep `dataplane` correct**: A common bug is pointing a `compute` at a DataPlane that does not exist in this Tenant. Poros rejects it at apply time. Confirm with `dataos-ctl resource get -t dataplane -n <id>` first.
5. **Use per-job compute in workflows**: Give cheap steps cheap compute and expensive steps premium compute. Do not force one compute on a multi-stage DAG.
6. **Document each compute in `description` and `tags`**: Other teams in the Tenant inherit it from a name like `analytics-compute`. A two-line description prevents wrong usage.

***

### Troubleshooting

<details>

<summary><code>Validation error: dataplane is required</code></summary>

* **Likely cause:** `spec.dataplane` is empty or the environment variable was not substituted.
* **Where to look:** Check `envs/usecases.env`, then re-render the manifest.

</details>

<details>

<summary><code>dataplane not found: &#x3C;id></code></summary>

* **Likely cause:** The DataPlane resource does not exist in this Tenant.
* **Where to look:** Run `dataos-ctl resource get -t dataplane -n <id>`.

</details>

<details>

<summary><code>Validation error: nodeSelector must be non-empty</code></summary>

* **Likely cause:** `spec.nodeSelector` is missing or empty.
* **Where to look:** Add at least one label under `spec.nodeSelector`.

</details>

<details>

<summary>Workload stuck in <code>Pending</code> after applying</summary>

* **Likely cause:** No nodes match the `nodeSelector` or required taints.
* **Where to look:** Run `kubectl get nodes --show-labels` on the dataplane cluster.

</details>

<details>

<summary>Workload schedules on the wrong nodes</summary>

* **Likely cause:** Two computes share the same `nodeSelector`.
* **Where to look:** Make selectors disjoint. Use `nodePool.taints` for premium pools.

</details>

<details>

<summary>Renaming broke other workloads</summary>

* **Likely cause:** Existing workloads still reference the old compute name.
* **Where to look:** Add the old name to `spec.aliases` before deleting, then migrate consumers.

</details>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/operate/phase-3-configure-tenant-with/additional-computes.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
