> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/concepts/resources/workflow.md).

# Workflow

A Workflow is a batch workload that runs steps in a directed acyclic graph, or DAG. Use it to orchestrate scheduled jobs, validations, backfills, and repeatable batch tasks.

Each DAG node is a step. Each dependency edge defines execution order. The Workflow defines orchestration. The stack defines what each step runs.

## Prerequisites

To create a Workflow, you need a tenant-specific role (**Tenant Admin** or **Data Developer**).

To use a Workflow, you need resource-specific permission granted by the Workflow owner.

## When to use

Use a Workflow when you need to:

* Run batch logic to completion.
* Trigger execution on a schedule or on demand.
* Coordinate one or more dependent steps.

## Manifest structure

Use the following structure to define a Workflow resource.

{% tabs %}
{% tab title="Syntax" %}
{% code title="workflow\.yaml" expandable="true" collapsedlinecount="12" %}

```yaml
version: v2alpha
name: ${{workflow-name}}
type: workflow
tags:
  - ${{tag1}}
  - ${{tag2}}
description: ${{workflow-description}}
spec:
  schedule:
    crons:
      - ${{cron-expression}}
    timezone: UTC
    concurrencyPolicy: Forbid
  dag:
    - name: ${{step-name}}
      title: ${{step-title}}
      depends:
        - ${{upstream-step-name}}
      spec:
        compute: ${{compute-resource}}
        resources:
          requests:
            cpu: ${{cpu-request}}
            memory: ${{memory-request}}
          limits:
            cpu: ${{cpu-limit}}
            memory: ${{memory-limit}}
        stack: ${{stack-name}}
        stackSpec:
          ${{stack-specific-configuration}}
```

{% endcode %}
{% endtab %}
{% endtabs %}

## Core concepts

### Workflow and DAG

A Workflow is an execution plan for batch work. The plan is expressed as a DAG.

Use a single step for simple jobs. Add more steps when work must happen in stages. Use `depends` to describe order.

{% code title="workflow\.yaml" expandable="true" collapsedlinecount="10" %}

```yaml
spec:
  dag:
    - name: extract
      spec: {}
    - name: validate
      depends:
        - extract
      spec: {}
    - name: publish
      depends:
        - validate
      spec: {}
```

{% endcode %}

### Workflow type

Workflows are typically defined as `instance` workloads.

An instance starts, processes work, and exits.

### Schedule

Use `spec.schedule` to run the Workflow automatically.

Set one or more cron expressions in `crons`. Set `timezone` to control evaluation time. Use `concurrencyPolicy` to control overlapping runs.

### Steps

Each item in `spec.dag` is a step.

Each step can define:

* A unique `name`
* Optional `depends` entries
* Its own runtime configuration in `spec`

This lets one Workflow mix simple and complex steps in the same DAG.

### Compute and resources

Use `compute` to choose the execution environment.

Use `resources.requests` and `resources.limits` to size each step. Set these values per step when different stages need different capacity.

### Stack and stackSpec

`stack` selects the runtime for a step. `stackSpec` contains the stack-specific configuration.

Start by designing the DAG. Then choose any stack that supports Workflow execution mode.

Use `container` for custom scripts, CLIs, and packaged batch jobs. Use another supported stack when its runtime model better fits the step.

{% code title="workflow-step.yaml" expandable="true" collapsedlinecount="8" %}

```yaml
spec:
  dag:
    - name: run-step
      spec:
        stack: ${{stack-name}}
        stackSpec:
          ${{stack-specific-configuration}}
```

{% endcode %}

### Projections and storage

Workflow steps can also consume projections and mounted volumes.

Use `use.projection` for secrets and runtime configuration. Use `use.volumes` when steps must share files across the DAG.

## Workflow examples

Design the DAG first. Then attach a stack to each step.

### Single-step container Workflow

Use a single step when the job has no internal dependencies.

{% code title="daily-report.yaml" expandable="true" collapsedlinecount="14" %}

```yaml
version: v2alpha
name: daily-report
type: workflow
tags:
  - report
  - container
description: "Generate a daily report in one batch step."
spec:
  dag:
    - name: generate-report
      title: Generate report
      spec:
        compute: runnable-default
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi
        stack: container
        stackSpec:
          image: docker.io/library/alpine:3.20
          command:
            - sh
          arguments:
            - -c
            - |
              echo "Generating report"
              date -u
              echo '{"status":"ok"}'
```

{% endcode %}

### Multi-step DAG with a container stack

Use multiple steps when work must happen in order.

This example introduces the DAG first:

* `extract` prepares input files.
* `validate` checks the result.
* `publish` runs only after validation passes.

{% code title="customer-batch-pipeline.yaml" expandable="true" collapsedlinecount="18" %}

```yaml
version: v2alpha
name: customer-batch-pipeline
type: workflow
tags:
  - batch
  - container
description: "Run extract, validate, and publish as a DAG"
spec:
  dag:
    - name: extract
      title: Extract source data
      spec:
        compute: runnable-default
        use:
          volumes:
            - id: ${SHARED_VOLUME_ID}
              directory: /workspace
              readOnly: false
        stack: container
        stackSpec:
          image: docker.io/library/python:3.12-alpine
          command:
            - sh
          arguments:
            - -c
            - |
              mkdir -p /workspace/raw
              printf '{"customers": 42}\n' > /workspace/raw/customers.json

    - name: validate
      title: Validate extracted data
      depends:
        - extract
      spec:
        compute: runnable-default
        use:
          volumes:
            - id: ${SHARED_VOLUME_ID}
              directory: /workspace
              readOnly: false
        stack: container
        stackSpec:
          image: docker.io/library/alpine:3.20
          command:
            - sh
          arguments:
            - -c
            - |
              test -s /workspace/raw/customers.json
              echo "validation passed"

    - name: publish
      title: Publish validated output
      depends:
        - validate
      spec:
        compute: runnable-default
        use:
          volumes:
            - id: ${SHARED_VOLUME_ID}
              directory: /workspace
              readOnly: false
        stack: container
        stackSpec:
          image: docker.io/library/alpine:3.20
          command:
            - sh
          arguments:
            - -c
            - |
              mkdir -p /workspace/published
              cp /workspace/raw/customers.json /workspace/published/customers.json
              echo "publish complete"
```

{% endcode %}

### Parallel branches in a DAG

Use parallel branches when steps do not depend on each other.

In this example, `fetch-customers` and `fetch-orders` run independently. `merge-data` waits for both.

{% code title="parallel-batch-pipeline.yaml" expandable="true" collapsedlinecount="16" %}

```yaml
version: v2alpha
name: parallel-batch-pipeline
type: workflow
tags:
  - batch
  - dag
description: "Run independent branches, then merge results"
spec:
  dag:
    - name: fetch-customers
      spec:
        stack: container
        stackSpec:
          image: docker.io/library/alpine:3.20
          command:
            - sh
          arguments:
            - -c
            - echo "fetch customers"

    - name: fetch-orders
      spec:
        stack: container
        stackSpec:
          image: docker.io/library/alpine:3.20
          command:
            - sh
          arguments:
            - -c
            - echo "fetch orders"

    - name: merge-data
      depends:
        - fetch-customers
        - fetch-orders
      spec:
        stack: container
        stackSpec:
          image: docker.io/library/alpine:3.20
          command:
            - sh
          arguments:
            - -c
            - echo "merge results"
```

{% endcode %}

### Workflow with projections

Use projections when the Workflow needs secrets or runtime context.

{% code title="projected-workflow\.yaml" expandable="true" collapsedlinecount="16" %}

```yaml
version: v2alpha
name: projected-workflow
type: workflow
description: "Inject runtime values into a batch step"
spec:
  type: instance
  use:
    projection:
      secrets:
        - id: workflow-secret
          contextAlias: app
      projections:
        envVars:
          - key: WORKFLOW_ID
            value: "{{.defaultProjections.dataOsRunId}}"
          - key: API_TOKEN
            value: "{{.secrets.app.API_TOKEN}}"
  dag:
    - name: run-task
      spec:
        stack: container
        stackSpec:
          image: docker.io/library/alpine:3.20
          command:
            - sh
          arguments:
            - -c
            - |
              echo "Workflow: $WORKFLOW_ID"
              test -n "$API_TOKEN"
```

{% endcode %}

## Workflow lifecycle

Workflows are runnable resources. They support runtime details and logs from creation through completion.

### Create or update a Workflow

```bash
dataos-ctl resource apply -f /home/workflow.yaml
```

### Get the specific Workflow details

```bash
dataos-ctl resource get -t workflow -n echo-workflow -d
```

### View runtime details

```bash
dataos-ctl resource runtime get -t workflow -n echo-workflow
```

### List Workflows

```bash
dataos-ctl resource get -t workflow
```

### View across all Workspaces

```bash
dataos-ctl resource get -t workflow -a
```

### View execution logs

```bash
dataos-ctl resource log -t workflow -n echo-workflow
```

### View logs for a specific container

```bash
dataos-ctl resource log -t workflow -n echo-workflow --container extract
```

### Delete a Workflow

```bash
dataos-ctl resource delete -t workflow -n echo-workflow
```

## Workflow states

Workflows move through these common states:

* `creating`
* `active`
* `running`
* `succeeded`
* `failed`
* `deleted`

## Field reference

| Field                                | Description                                           |
| ------------------------------------ | ----------------------------------------------------- |
| `version`                            | API version of the Workflow resource.                 |
| `name`                               | Unique name of the Workflow.                          |
| `type`                               | Resource type. Set this to `workflow`.                |
| `tags`                               | Labels used for grouping and filtering.               |
| `description`                        | Short summary of the Workflow.                        |
| `spec`                               | Main configuration block for execution settings.      |
| `spec.type`                          | Execution mode for the Workflow. Commonly `instance`. |
| `spec.use.projection`                | Workflow-level projections for secrets and config.    |
| `spec.dependencies`                  | Resource-level prerequisites before execution starts. |
| `spec.schedule`                      | Scheduling configuration for automatic execution.     |
| `spec.schedule.crons`                | One or more cron expressions for run timing.          |
| `spec.schedule.timezone`             | Time zone used to evaluate the cron schedule.         |
| `spec.schedule.concurrencyPolicy`    | Controls whether overlapping runs are allowed.        |
| `spec.dag`                           | List of batch steps in the Workflow DAG.              |
| `spec.dag[].name`                    | Name of an individual step.                           |
| `spec.dag[].title`                   | Optional human-readable title for a step.             |
| `spec.dag[].depends`                 | Step dependencies inside the DAG.                     |
| `spec.dag[].spec`                    | Runtime configuration for that step.                  |
| `spec.dag[].spec.compute`            | Compute target used for the step.                     |
| `spec.dag[].spec.resources.requests` | Minimum CPU and memory requested for execution.       |
| `spec.dag[].spec.resources.limits`   | Maximum CPU and memory allowed for execution.         |
| `spec.dag[].spec.runAsUser`          | User identity used to run the step.                   |
| `spec.dag[].spec.use.volumes`        | Mounted volumes available to the step.                |
| `spec.dag[].spec.use.projection`     | Injected secrets and configuration for the step.      |
| `spec.dag[].spec.stack`              | Runtime stack used to execute the step.               |
| `spec.dag[].spec.stackSpec`          | Stack-specific configuration details.                 |


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/concepts/resources/workflow.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
