> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/release-notes/may-2026.md).

# May 2026

|                  |                                      |
| ---------------- | ------------------------------------ |
| **Release**      | DataOS 2.0 (`<codename>-v2.0.0`)     |
| **Release date** | `<DD Month 2026>`                    |
| **Status**       | `<General Availability / Preview>`   |
| **Upgrade from** | None. This is the first 2.0 release. |

We're rolling out DataOS 2.0. This release is a collaborative effort across the whole team, built to reflect the feedback we heard on DataOS 1.0, the features you asked for, a better experience end to end, and the core business needs our platform now enables. We're keeping the door open: feedback, suggestions, and open questions are all welcome, so please reach out.

DataOS 2.0 changes how you spend your day. Less time wiring tools together, chasing broken pipelines, and asking around for which table to trust. More time shipping data that people, applications, and AI agents actually use. You write the logic. DataOS handles the infrastructure, the contracts, the lineage, and the access, through a GUI, a CLI, or APIs.

The fastest way to feel the difference is to ship something. You can have a working Data Product running against your own engine in a single sitting. [Build your first Data Product](/build/get-started/build-your-first-data-product.md).

***

## Start where you are

Skip to the parts of this release that match what you do.

| You are                           | Start with                                                                                                                                                                          |
| --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| A data engineer building products | [Build on your engine](#build-on-your-engine), [See changes before you ship](#see-what-a-change-breaks-before-you-ship-it), [Stop bad data at the door](#stop-bad-data-at-the-door) |
| A data analyst or consumer        | [Trust data before you query it](#trust-data-before-you-query-it), [Serve it to BI and apps](#serve-it-to-bi-and-apps)                                                              |
| Working with AI agents            | [Let AI query your data safely](#let-ai-query-your-data-safely)                                                                                                                     |
| A CTO or platform owner           | [The shift behind 2.0](#the-shift-behind-2-0), [Stand up a tenant with code](#stand-up-a-tenant-with-code-not-tickets), [Release details](#release-details)                         |
| A program manager or steward      | [The shift behind 2.0](#the-shift-behind-2-0), [Governance and access](#governance-and-access), [Known limitations](#known-limitations)                                             |

Every section below has a **What changed** drawer at the end. Open it to see the exact list of new capabilities, improvements, and fixes that landed in that area.

***

## Build on your engine

You don't move your data to use DataOS, and you don't learn a new runtime. Point it at Snowflake, BigQuery, Databricks, or Postgres, or use the DataOS Lakehouse with Spark or Trino. Write your transformations in SQL or Python, run them where the data already sits, and build your first Data Product the same afternoon. No migration, no rip-and-replace.

A few specifics that matter once you start:

* A Data Product is a folder of YAML and code. Models declare their own dependencies with `FROM` references, so the run order, the backfill scope, and the lineage all come out of one graph.
* Physical models combine the table definition and the logic that updates it. No separate "job" to chase.
* Python models run on the engine's native compute: Snowpark on Snowflake, BigFrames on BigQuery, PySpark on Databricks. Python 3.10, tables only.
* `vulcan plan` and `vulcan run` cover the full loop: diff, backfill prompt, audits, DQ, and a persisted plan ID you can roll back to.

<details>

<summary>What changed in Build</summary>

**New**

* Local development toolkit. Author and run the full flow on your laptop before deploying anything.
* Models declare dependencies through `FROM` references. The dependency graph, backfill scope, and lineage are derived from it, no bundle juggling.
* Eight materialization kinds out of the box: `full`, `view`, `incremental_by_time_range`, `incremental_by_unique_key`, `scd_type_2`, `seed`, and more.
* Python models on the engine's native compute (Snowpark, BigFrames, PySpark). Tables only.
* Seeds. CSV files committed alongside the code, usable as `seed` models. Good for prototypes and small master data (\~1M rows × \~10 columns is safe).
* External models. Reference existing warehouse tables as inputs, or publish a semantic model directly on top of them. `vulcan create external_models` scans the warehouse and writes the YAML for you.
* Versioning of code *and* data. Change a definition (`revenue × 10`) and the affected columns get rewritten. Roll back the definition, the data follows.
* Linters. Enforce project rules (for example, "every model must have a description"). A failing linter blocks the plan.
* Notifications target Console, Slack, Teams, Email, or a Webhook. Webhook is the escape hatch for custom workflows (kick off a stored procedure on a quality failure, for example).

**Improvements**

* Column descriptions written on the physical model propagate to the Data Product Hub *and* to the warehouse (Snowflake comments, for example). One source of truth.

</details>

***

## See what a change breaks before you ship it

Change one upstream column and three dashboards quietly go wrong two weeks later. Not anymore. Lineage in 2.0 goes down to the column and runs end to end, from source all the way to output, across engines.

The Data Product page now puts lineage and semantics together in one **Assets** tab. Models are split into Seed, Inputs, Outputs, Semantic, and Metrics. Every model carries its own lineage view. Click a column, see exactly where it flows. Even large graphs render without freezing the UI. [Explore a Data Product's assets](/consume/understand/assets.md).

<details>

<summary>What changed in Assets and lineage</summary>

**New**

* Data Product page rebuilt around how data flows. Lineage and semantics live together in one **Assets** tab. Models split into Seed, Inputs, Outputs, Semantic, and Metrics.
* Per-model lineage view, showing how data arrives and where it leaves.
* End-to-end column lineage. Follow a single column from source through to output across the whole Data Product.
* External models (Inputs) can pull metadata directly from the source dataset.

**Improvements**

* The overview graph now includes apps, products, and metrics. A fuller picture of the Data Product's ecosystem.
* Lineage now surfaces data flow into the semantic and metric layers, not just physical datasets.
* Large lineage graphs are navigable. The UI no longer freezes on the long graphs that used to lock it up.

**Fixes**

* Column names in the Assets tab show in full instead of being truncated to a single character.
* Info icon and "+more" glossary pill no longer overlap in the Assets tab.
* Graph edges for the Metrics node render correctly.
* Tags, run status, and quality scores show up on the Data Product listing screen.
* Layout and CSS on the Product details page are cleaned up.
* Dataset icons for Trino and other dialects render correctly on the overview page.
* The Members panel shows every owner.

</details>

***

## Stop bad data at the door

Most data incidents are silent. A schema drifts upstream, nothing errors, and consumers keep building on broken numbers. In 2.0 the contract lives inside the Data Product. Schema, structure, and quality rules are checked in-band. Data that breaks a rule gets blocked before it's published, not flagged after it's already downstream. [Define a data contract](/build/stage-2-productize/define-the-contract.md).

Three frameworks, three jobs:

| Framework             | Job                                                                          | Blocks data flow?       |
| --------------------- | ---------------------------------------------------------------------------- | ----------------------- |
| **Tests**             | Unit tests on the SQL or Python logic with sample input and expected output. | Yes. Fails the build.   |
| **Audits/assertions** | Logical guards on real data (`not_null`, `unique`, custom SQL).              | Yes. Blocks downstream. |
| **DQ rules**          | Observability checks across the six ODPS dimensions.                         | No. Reports only.       |

<details>

<summary>What changed in Quality</summary>

**New**

* Tests folder. Unit tests on transformation logic with sample input and expected output, run as the first step of `vulcan plan`.
* \~30 to 40 prebuilt audits (`not_null`, `unique`, and more). Drop in your own custom SQL when none of those fit.
* DQ rules in the Soda dialect, run after each model run. Non-blocking by design, so observability never freezes production.
* Quality summary printed by ODPS dimension (completeness, uniqueness, validity, and the rest) after every plan.

</details>

***

## Trust data before you query it

Before you run a single query, you can see what a Data Product is, whether it passed its quality checks, when it last ran, and who depends on it, right in the catalog. Browse and search across every source at once, or scope down to a source type, database, or schema. No Slack thread to find out whether a table is safe to use.

<details>

<summary>What changed in Discover</summary>

**New**

* Global dataset browse and search. Browse all sources at once or scope to a specific source type, database, or schema.
* Quality, last-run status, run history, refresh latency, weekly active users, query counts, and average query time, all on the Data Product overview page.

**Improvements**

* Activity and the Connect tab show data inline in the Hub, so you can check freshness and consumption without leaving the page.

</details>

***

## Serve it to BI and apps

Once a Data Product is published, the same definition reaches BI tools, applications, notebooks, and the command line. No second Talos step, no copy of the semantic layer per consumer.

* **REST, GraphQL, and MySQL wire protocol** ship with every Data Product. The Connect tab gives you the JDBC string, the custom Power BI connector (`.mez`), and step-by-step setup.
* **Power BI Desktop** connects through the MySQL wire protocol using the shipped `.mez` connector. Direct query works, because the native MySQL connector does not.
* **DBeaver and the `mysql` CLI** connect with one shell command:

  ```bash
  mysql --enable-cleartext-plugin \
        -h <instance>.dataos.cloud -P 3306 \
        -u <username> -p<api_key> \
        <tenant>.<data_product>
  ```
* **Workbench** runs against Minerva, Snowflake, Databricks, and the other connected engines directly. The 50,000-row limit is gone. Chain cells together to query a prior cell's result in a new one.
* **Studio** lets you query metrics directly and filter on multiple values at once.
* **Perspectives** are first-class. A perspective is a focused subset of the semantic model with its own data API, cached against the underlying SLA. If the data has not changed, the request returns the cached result and the warehouse stays cold.

<details>

<summary>What changed in Consume and Activation</summary>

**New**

* First Activation Interface, surfacing every way a Data Product reaches a consumer in one place.
* Data Product APIs across **Metadata** (with the new `/semantic` endpoint), **Quality**, **Activity** (with the new `/git-diff` endpoint), **Data**, **Perspectives**, and **Health**, with a downloadable Postman collection.
* MySQL protocol support for semantic models from any MySQL terminal client or DBeaver. Read-only by design (`SELECT` only).
* Power BI integration (initial release) for semantic model connectivity, dashboard creation, and measure and dimension exploration.
* Apps are now a first-class resource. `link` (registered, hosted elsewhere) and `container` (hosted inside DataOS). Apps declare which Data Products they consume; the Data Product page shows the apps consuming it.
* Studio: query metrics directly and filter on multiple values at once.
* Workbench: cancel a running query mid-execution and retry after a network failure without restarting the bench.

**Improvements**

* Activation sidebar on the Connect tab reorganized by intent for clearer navigation.
* Activation options that depend on semantic definitions are locked until the dependency is met, so the requirement is explicit instead of buried in an error.
* Connection docs expanded: DBeaver added as a supported MySQL client, downloadable Postman collection in the REST API section, updated Power BI and MySQL CLI instructions.
* Workbench: messaging cleaned up across the trash confirmation, the discard-changes prompt, and per-cell query guidance.
* Workbench: lazy-loading scrolls smoothly without spinner interruptions.
* Workbench: "Run All" halts on the first cell failure instead of running every cell regardless.
* Workbench: DuckDB type mismatches are caught and surfaced with a clear error instead of failing silently.
* Workbench: pinned tables show all relevant columns.

**Fixes**

* Studio: metric context, navigation, and granularity work as expected.
* Studio: currency and percent metric formats apply correctly.
* Studio: measure filter payload is sent correctly.
* Studio: metrics with duplicate names no longer cause query failures or missing metadata.
* Workbench: consistent cell reference naming in cloned benches.
* Workbench: the "Bench Not Found" error on bench close is resolved.
* Workbench: significantly improved performance with a high number of columns.
* Workbench: querying large data via cell reference no longer degrades performance.
* Workbench: stable performance on large query execution.
* Workbench: joins with Minerva on large datasets work as expected.
* Workbench: local cells via intercell reference are queryable after save.
* Workbench: consistent permission handling for Data Consumers without depot access.

**Known limitation**

* Power BI report-level and dashboard-level filters are not supported yet.

</details>

***

## Let AI query your data safely

Pointing an LLM at a raw warehouse is how you get confident, wrong answers. In 2.0, Data Products are exposed to agents as governed tools, not raw tables. An agent resolves through your semantic layer and your contract instead of guessing which table is canonical. Open Home, grab a ready-to-use MCP configuration from Developer Tools setup, and connect Claude, Cursor, or a LangChain agent in minutes. [Connect AI with runtime MCP tools](https://app.gitbook.com/s/cL04JUTJPL73kRrjaBSa/consume-with-ai/runtime-mcp-tools).

The agent can do four things:

| Capability   | What an agent can ask                                                                                         |
| ------------ | ------------------------------------------------------------------------------------------------------------- |
| **Discover** | What Data Products do you have? What is in this one? What can it answer? Filter by domain, tag, glossary.     |
| **Trust**    | Tell me about the quality, lineage, and freshness. Show me failing checks. Is this column PII?                |
| **Query**    | Show me revenue by region last quarter. What measures does this Product expose? Slice this metric by city.    |
| **Build**    | Scaffold a new Data Product from a brief, generate the models and quality rules, run `vulcan plan` to verify. |

Retrieval is deterministic. The data comes from the Data Product API, not the LLM. If your Data Products are well-named and non-overlapping, the agent picks the right one almost every time.

<details>

<summary>What changed in AI</summary>

**New**

* Data Products are exposed to agents over MCP for both consuming and building.
* Built-in tools that search Data Products by domain, intent, metric, owner, or business term; trace upstream and downstream lineage for impact analysis; and profile tables and columns (row counts, nulls, min/max, distribution).
* Connect from Claude, Cursor, and VS Code, or from an agentic framework like LangChain.
* MCP configuration on Home under Developer Tools setup, with ready-to-use configurations for a range of LLM and agentic framework tools. No manual setup to wire an agent to a Data Product.
* Every model, dimension, measure, segment, join, and metric supports an `ai_context` block (instructions, synonyms, examples) so the agent gets your business context, not the public internet's.
* A 9 to 10 question Builder conversation: business context → data and structure (with catalog recommendations) → activation. The output is a `data_product_plan.md`, a scaffolded project, and a verified `vulcan plan`.

</details>

***

## Governance and access

Access control was reworked from the inside. Tenants are the boundary. A high-privilege role in tenant A gives you nothing in tenant B. A small set of opinionated roles replaces the 100-plus granular use cases from 1.0, and every allow/deny decision is logged.

| Role             | Scope      | What it grants                                                                                      |
| ---------------- | ---------- | --------------------------------------------------------------------------------------------------- |
| `tenant_admin`   | One tenant | Full access inside the tenant. Manages user roles, invites users, provisions compute.               |
| `data_admin`     | One tenant | Manages data sources (depots, lake-houses, related access control). Aimed at a domain data steward. |
| `data_developer` | One tenant | Builds Data Products. CLI access, deploy, monitor logs, set alerts, explore own Data Products.      |
| `data_consumer`  | One tenant | Discover, explore, and consume Data Products they have access to. UI only.                          |

Defence policies are on by default. When a resource is deployed, DataOS auto-creates a policy reserving read, update, delete, and manage-access to the owner. Even a tenant admin cannot delete a resource they do not own without an explicit grant.

<details>

<summary>What changed in Governance and access</summary>

**New**

* Tenant-scoped policies. No global super-user beyond the instance operator, and the operator has no tenant-level rights without explicit grants.
* Centralized authorization audit trail. Every allow/deny decision is logged with user, action, resource, decision, reason, and timestamp, and is queryable from the UI.
* Proactive defence policies. Every resource auto-creates an owner-only policy on read, update, delete, and manage-access.
* Streamlined last-mile access granting. Owners hand out access from the resource UI itself, no Bifrost trip.
* Tenant invitation through email. Tenant admins invite by existing instance ID, by Modern AD ID, or by email (after adding the user as a guest in Modern's AD).
* One-time API token view-and-copy. Tokens are shown once on generation and never again.

**Improvements**

* Zero schema access shows a proper empty state instead of a broken view.
* Ownership and assignment fields handle multiple users correctly across listing pages.

**Fixes**

* Role assignments refresh immediately after add or delete, no stale data.
* The Tenant Admin option stays enabled after navigating back from the Profile page.
* Duplicate tenant names are validated in the UI instead of being silently accepted.
* Switching tenants no longer causes incorrect redirects or errors on detail pages.
* The filter icon on the Access page works.
* Latest run history is visible.

</details>

***

## Stand up a tenant with code, not tickets

Provisioning a secure environment used to mean weeks of back-and-forth with infrastructure. In 2.0, operators define compute, storage, lakehouse layers, and credentials as Terraform. Your data never leaves your cloud. New tenants come up the same way every time.

A tenant in 2.0 owns its own Data Products, its own compute (mapped to a node pool in any supported cloud), and its own access control. A failure in one tenant cannot touch another. Three patterns cover most customers:

| Pattern                          | When to use it                                           |
| -------------------------------- | -------------------------------------------------------- |
| Dev / Staging / Prod             | Simple starting point.                                   |
| Isolation by line of business    | Larger enterprises with separate domains and governance. |
| Shared / cross-functional tenant | When two lines of business need to collaborate.          |

<details>

<summary>What changed in Tenancy</summary>

**New**

* Tenants replace the loose 1.0 workspace concept. Every workload, resource, and Data Product lives inside a tenant.
* Each tenant has its own compute, mapped to a node pool in any supported cloud.
* Tenant switcher in the top bar of the Hub; same flow on the CLI.
* CLI download is baked into the Hub UI. Click, follow the steps, done.

**Improvements**

* The duplicate-compute problem from 1.0 is gone. Transformations run on the warehouse, the API layer runs on the tenant's compute, and the web app runs on the control plane.

</details>

***

## General fixes

A few cross-cutting fixes that did not belong in any single section.

<details>

<summary>What changed across the platform</summary>

**Fixes**

* Creating an API token with an existing name now shows a clear error message.
* Icon styles on the global left navigation panel are consistent.
* The owner-name filter on the Perspectives listing page works as expected.

</details>

***

## The shift behind 2.0

For a decade the goal was to pull everything into one warehouse and let people self-serve. That worked until AI agents showed up and started returning confident, wrong answers, because they had the tables but none of the context a human analyst carries in their head: which table is canonical, what "revenue" actually means, when the quarter really ends.

DataOS 2.0 treats the **Data Product as the atomic unit, and keeps it above the engine instead of locked inside it.** The contract, the semantics, the lineage, and the quality travel with the product, so the same definition is reachable wherever the bytes live. You build context once. Every consumer, including agents, reuses it instead of rebuilding it per query.

Your work follows the path a Data Product takes, and every capability is organized around it:

* **Discover.** Browse the catalog, profile data, and pull in what's missing before you commit to building.
* **Produce.** Write transformations in SQL or Python, enforce the contract in-band, and define the semantic layer once.
* **Consume.** Serve the same product to BI tools, applications, notebooks, and AI agents, each through the protocol that fits them.

Governance, lineage, and observability apply across all three, so they are properties of the platform, not separate projects.

{% hint style="info" %}
Want the full argument behind this design? Read [Data Products: The Essential Context for Enterprise AI](https://moderndata101.substack.com/p/data-products-the-essential-context).
{% endhint %}

***

## Known limitations

* Power BI report-level and dashboard-level filters are not supported yet.
* Perspectives are not editable. Create a new one to change columns or filters; this keeps consumer contracts stable.
* Perspectives cannot span multiple Data Products in this release. Dedicated access policies for perspectives ship in the next release.
* The Data Product MCP answers within one Data Product per query. Cross-product questions (for example, "list all PII columns across the system") are on the roadmap.
* `<Add any other confirmed limitations before publishing.>`

***

## Release details

| Area                | Detail                                                                             |
| ------------------- | ---------------------------------------------------------------------------------- |
| Control plane / CLI | `<Control Plane vX.Y.Z, Data Plane vX.Y.Z, CLI vX.Y.Z>`                            |
| Supported engines   | `<Snowflake, BigQuery, Databricks, Postgres, Trino, Spark with min versions>`      |
| Supported clouds    | `<AWS, Azure, GCP>`                                                                |
| Open standards      | Open Data Product Specification (ODPS), Open Semantic Interchange (tracking 0.1.1) |

### Component versions

| Component | Version      | Notes                                                                                     |
| --------- | ------------ | ----------------------------------------------------------------------------------------- |
| Vulcan    | `0.228.1.19` | Snowflake image `tmdcio/vulcan-snowflake:0.228.1.19`. Spark/Trino images on parity tag.   |
| Nilus     | `v2.0.19`    | Now deployable through the standard DataOS container stack. Available across all tenants. |
| UI        | `R-280526`   | Latest Platform UX release of the cycle. Preceded by `R-140526` earlier in the sprint.    |

This is the first 2.0 release. There is no upgrade or migration path from a prior version. Future releases will be tracked in the [Change log](https://github.com/moderndatacompany/dataos/blob/main/documentation/releasenotes/change-log/README.md), and what's coming next lives on the [Roadmap](https://github.com/moderndatacompany/dataos/blob/main/documentation/releasenotes/roadmap.md).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/release-notes/may-2026.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
