> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/build/stage-2-productize/overview.md). # Overview **Stage 2: Productize** is where you turn an idea and the data you discovered in Stage 1 into a tested, deployable artifact. By the end of this stage, you will have a Vulcan project with transformation logic, validation rules, a semantic layer, and everything the publish step needs. Throughout this section, examples are drawn from `orders-analytics`: a real e-commerce data product built on PostgreSQL. It tracks order revenue, customer segmentation, product performance, and fulfillment conversion. Following the examples closely will give you a concrete mental model of what each step produces. *** ## Data Product Components A Data Product built with Vulcan is composed of the following components:

Component	What it is	In orders-analytics
Source inputs	The upstream tables, seeds, or input your models read from.	Nine `public.*_ext` tables declared in `external_models.yaml` (orders, customers, products, regions, shipments, and more).
Models	SQL files that define transformation logic. Each model produces a table in your warehouse.	Twelve SQL models organized in three layers: `bronze` (source-aligned), `silver` (facts and dimensions), and `gold` (analytics-ready).
Seeds	Static CSV files loaded as lookup tables.	`order_status_lookup.csv` loaded into `bronze.order_status_lookup` with status group and fulfillment flags.
Macros	Reusable Python functions called inside SQL using the `@macro_name()` syntax.	`@safe_ratio()` for null-safe division and `@revenue_order_filter()` to exclude Cancelled orders from revenue calculations.
Tests	YAML files that verify model logic against mock data before anything touches production.	Two tests: one confirming Cancelled orders are excluded from `silver.fct_daily_sales`, another confirming the RFM Champion segment logic.
Assertions	SQL assertions that run automatically at materialization and block bad data from reaching downstream.	Three SQL assertion files covering daily sales metric consistency, RFM score validity, and order status lookup integrity.
Checks	Non-blocking YAML rule packs (`kind: dq`) that monitor data quality over time.	Seven DQ packs covering row counts, null checks, value validity, and cross-table referential integrity.
Semantics	YAML files (`kind: semantic`) that map physical tables to business-friendly measures, dimensions, and segments.	Six semantic models covering daily sales, weekly sales, customer profile, product profile, RFM segmentation, and sales funnel.
Metrics	YAML files (`kind: metric`) that combine a measure with a time column into a queryable time-series definition.	Five metrics: daily sales performance, weekly revenue trends, customer lifetime value, RFM value by segment, and fulfillment conversion.

All of these live inside a single Vulcan project folder that you initialize once and iterate on throughout the stage. *** ## Data flow in orders-analytics The orders-analytics pipeline follows a bronze / silver / gold medallion structure: ``` Source tables (public.*_ext) | v bronze.* (source-aligned, raw shape, assertions) | v silver.* (facts and dimensions, business logic applied) | v gold.* (analytics-ready: RFM segmentation, sales funnel) | v Semantic models --> Business metrics ``` Each layer reads from the layer above it. Vulcan resolves the execution order automatically from the SQL dependencies. *** ## Build flow ``` Stage 1: Discovery --> Stage 2: Productize --> Stage 3: Publish ``` | Stage | Goal | | ----------------------- | --------------------------------------------------------------------------------------------- | | Stage 1: Discovery | Understand the data. Inspect metadata, explore with Workbench, bring in missing data. | | **Stage 2: Productize** | **Build the artifact. Models, tests, semantics, and quality rules.** | | Stage 3: Publish | Deploy and share. Configure deployment, publish to the Data Product Hub, communicate changes. | Stage 2 is the core build stage. Nothing in Stage 3 can happen until this stage produces a tested, validated project. *** ## Steps in Stage 2 The pages in this section map directly to the work you do in sequence: 1. **LDK Setup**: It is a prerequisite; install the Vulcan CLI and initialize the project. 2. **Connect to Engine**: wire your project to the data warehouse. 3. **Repository Setup**: connect your Git repository to DataOS for deployment. 4. **Configure project**: define `config.yaml` settings for connections, schedules, and linting. 5. **Define models**: write SQL and Python models for the transformation layer, then build the semantic layer and business metrics. 6. **Define the contract**: configure the linter, add tests, assertions, and data quality checks to enforce quality at every layer. 7. **Define governance**: configure execution hooks, notifications, and access controls. 8. **Validate and test locally**: run `vulcan plan`, `vulcan test`, and debug before deploying. 9. **Explore recipes**: use proven patterns for common build tasks. 10. **Build with AI assistance**: use the Data Product MCP to design and generate faster. Follow these steps in order on your first Data Product. Once you are familiar with the workflow, you can combine or skip steps as needed. *** --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://v2.dataos.info/build/stage-2-productize/overview.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.