> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/build/readme/ldk-setup.md). # LDK Setup The Local Development Kit (LDK) is the local toolkit for authoring and testing data products. Setup takes four steps: install Python, create an isolated environment, install Vulcan, and connect an engine. You only need to do this once. {% hint style="info" %} Vulcan requires **Python 3.10**. Versions outside `>=3.9, <3.11` are not supported. {% endhint %} ## 1. Install Python 3.10 Check first: ```sh python3.10 --version ``` If it is missing, install it: {% tabs %} {% tab title="macOS (Homebrew)" %} ```bash brew install python@3.10 ``` {% endtab %} {% tab title="Ubuntu / Debian" %} ```bash sudo apt update && sudo apt install python3.10 python3.10-venv ``` {% endtab %} {% tab title="Windows" %} Download the installer from [python.org/downloads](https://www.python.org/downloads/) and check **Add Python to PATH** during setup. {% endtab %} {% endtabs %} ## 2. Create a virtual environment Always install Vulcan inside an isolated environment so it does not conflict with other Python projects. ```sh python3.10 -m venv .venv ``` Activate it: {% tabs %} {% tab title="macOS / Linux" %} ```bash source .venv/bin/activate ``` {% endtab %} {% tab title="Windows" %} ```bash .venv\Scripts\activate ``` {% endtab %} {% endtabs %} `(.venv)` in your prompt confirms it is active. Upgrade pip: ```sh python3.10 -m pip install --upgrade pip ``` ## 3. Install Vulcan Vulcan ships as a Python wheel. {% file src="/files/sQXlQiYNo22hTmSXUKmo" %} Download and place the Vulcan `.whl` file in your working directory (or use its full path), then install (it bundles the DuckDB engine for local experimentation): ```sh pip install "./vulcan-0.228.1.25-py3-none-any.whl" ``` If you target a specific engine, install the matching extra. Quote the path so your shell does not interpret the brackets: ```bash pip install "./vulcan-0.228.1.25-py3-none-any.whl[postgres]" # or [snowflake], [databricks], [spark], [trino] ``` Verify: ```sh vulcan --version python3.10 -c "from vulcan import Context; print('Vulcan OK')" ``` ## 4. Initialize the project ```bash vulcan init ``` The initializer scaffolds the project structure:

Folder / file	What goes here
`config.yaml`	Project configuration: connections, model defaults, linting.
`models/`	SQL and Python model files. Each produces a table or view.
`dq/`	Data quality rule packs (`kind: dq`). Non-blocking; monitor quality over time.
`models/semantics/`	Semantic models (`kind: semantic`). Business-friendly wrappers over physical models.
`models/metrics/`	Metric definitions (`kind: metric`). Time-series analytical definitions.
`seeds/`	CSV files loaded as static tables.
`audits/`	SQL audit files. They run at materialisation and block execution if they return rows.
`tests/`	YAML unit tests. Run with `vulcan test` before touching the warehouse.
`macros/`	Reusable SQL snippets and Jinja macros.

## 5. Set up your engine To connect Vulcan locally, use an existing engine instance or spin one up via Docker, then add its connection to `config.yaml`. For the full connection reference per engine, see [Connect engine](/build/stage-2-productize/connect-to-engine.md). {% tabs %} {% tab title="Postgres" %} **Use an existing Postgres instance:** Use an existing Postgres instance if you already have one. You need the host, port, database, user, and password. See the [Postgres connection options](/build/stage-2-productize/connect-to-engine/postgres.md#required-connection-options) for all supported fields. Set your password as an environment variable: {% tabs %} {% tab title="Mac/Linux" %} ```bash export POSTGRES_PASSWORD='your_password' ``` {% endtab %} {% tab title="Windows" %} ```powershell $env:POSTGRES_PASSWORD = 'your_password' ``` {% endtab %} {% endtabs %} Usse this connection in `config.yaml`: ```yaml gateways: default: connection: type: postgres host: localhost port: 5433 database: warehouse user: vulcan password: "{{ env_var(POSTGRES_PASSWORD) }}" model_defaults: dialect: postgres ``` **Start Postgres locally with Docker:** Create the network once: ```bash docker network create vulcan ``` Save this as `docker/docker-compose.warehouse.yml`: ```yaml volumes: warehouse: driver: local networks: vulcan: external: true services: warehouse: image: postgres:17-alpine environment: POSTGRES_DB: warehouse POSTGRES_USER: vulcan POSTGRES_PASSWORD: vulcan POSTGRES_HOST_AUTH_METHOD: trust ports: - "5433:5432" volumes: - warehouse:/var/lib/postgresql/data networks: - vulcan ``` Start it: ```bash docker compose -f docker/docker-compose.warehouse.yml up -d ``` Full reference: [Connect engine → Postgres](/build/stage-2-productize/connect-to-engine/postgres.md). {% endtab %} {% tab title="Snowflake" %} **Use an existing Snowflake instance:** Use an existing Snowflake account and warehouse. No local Docker service is needed for Snowflake. Set your password as an environment variable: {% tabs %} {% tab title="Mac/Linux" %} ```bash export SNOWFLAKE_PASSWORD='your_password' ``` {% endtab %} {% tab title="Windows" %} ```powershell $env:SNOWFLAKE_PASSWORD = 'your_password' ``` {% endtab %} {% endtabs %} Use this connection in `config.yaml`: ```yaml gateways: default: connection: type: snowflake account: your_account user: your_user password: "{{ env_var('SNOWFLAKE_PASSWORD') }}" warehouse: your_warehouse database: your_database role: your_role model_defaults: dialect: snowflake ``` Full reference: [Connect engine → Snowflake](/build/stage-2-productize/connect-to-engine/snowflake.md). {% endtab %} {% tab title="Databricks" %} **Use an existing Databricks workspace:** Use an existing workspace with SQL warehouse or cluster access. No local Docker service is needed for Databricks. Set the access token as an environment variable, then: {% tabs %} {% tab title="Mac/Linux" %} ```bash export DATABRICKS_TOKEN='your_token' ``` {% endtab %} {% tab title="Windows" %} ```powershell $env:DATABRICKS_TOKEN = 'your_token' ``` {% endtab %} {% endtabs %} Use this connection in `config.yaml`: ```yaml gateways: default: connection: type: databricks server_hostname: your-workspace.azuredatabricks.net http_path: /sql/1.0/warehouses/your_warehouse_id access_token: "{{ env_var('DATABRICKS_TOKEN') }}" catalog: your_catalog model_defaults: dialect: databricks ``` Full reference: [Connect engine → Databricks](/build/stage-2-productize/connect-to-engine/databricks.md). {% endtab %} {% tab title="Spark" %} **Use an existing Spark cluster:** If you already have a Spark cluster, use the `vulcan-cli` service below and update `spark.master` in `config.yaml` to point to your cluster. Use this connection in `config.yaml`: {% code overflow="wrap" %} ```yaml gateways: default: connection: type: spark config: spark.master: spark://spark-master:7077 spark.app.name: vulcan spark.sql.catalog.local: org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.local.type: rest spark.sql.catalog.local.uri: http://iceberg-rest:8181 spark.sql.catalog.local.warehouse: s3://warehouse/ spark.sql.catalog.local.io-impl: org.apache.iceberg.aws.s3.S3FileIO spark.sql.catalog.local.s3.endpoint: http://minio:9000 spark.sql.catalog.local.s3.path-style-access: "true" spark.hadoop.fs.s3a.access.key: admin spark.hadoop.fs.s3a.secret.key: password spark.hadoop.fs.s3a.endpoint: http://minio:9000 spark.hadoop.fs.s3a.path.style.access: "true" model_defaults: dialect: spark2 ``` {% endcode %} **Start Spark locally with Docker:** Start a local Spark standalone cluster with MinIO and an Iceberg REST catalog. This avoids Windows Hadoop or `winutils.exe` issues because the Spark driver runs inside Linux. Place `vulcan-0.228.1.25-py3-none-any.whl` in your project root, then save this as `docker/docker-compose.spark.yml`: {% code overflow="wrap" expandable="true" %} ```yaml services: # Spark standalone cluster for running Spark executors in containers. spark-master: image: tmdcio/vulcan-spark-base:0.228.1.21 container_name: spark-seeds-minimal-spark-master restart: unless-stopped command: ["/bin/bash", "-lc", "/opt/spark/sbin/start-master.sh --host 0.0.0.0 --port 7077 --webui-port 8080 && tail -f /opt/spark/logs/*"] ports: - "7077:7077" - "8080:8080" networks: - spark-seeds-minimal-net spark-worker: image: tmdcio/vulcan-spark-base:0.228.1.21 container_name: spark-seeds-minimal-spark-worker restart: unless-stopped command: ["/bin/bash", "-lc", "/opt/spark/sbin/start-worker.sh spark://spark-master:7077 --webui-port 8081 && tail -f /opt/spark/logs/*"] depends_on: - spark-master ports: - "8081:8081" networks: - spark-seeds-minimal-net # MinIO for S3-compatible storage. minio: image: minio/minio:latest container_name: spark-seeds-minimal-minio restart: unless-stopped environment: - MINIO_ROOT_USER=admin - MINIO_ROOT_PASSWORD=password - MINIO_DOMAIN=minio ports: - "9000:9000" - "9001:9001" networks: spark-seeds-minimal-net: aliases: - minio - warehouse.minio volumes: - minio_data:/data command: server /data --console-address ":9001" healthcheck: test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"] interval: 5s timeout: 5s retries: 10 # MinIO setup - creates warehouse bucket. mc: image: minio/mc:latest container_name: spark-seeds-minimal-mc networks: - spark-seeds-minimal-net depends_on: minio: condition: service_healthy entrypoint: > /bin/sh -c " mc alias set minio http://minio:9000 admin password; mc mb --ignore-existing minio/warehouse; mc anonymous set public minio/warehouse; exit 0; " # Iceberg REST Catalog. iceberg-rest: image: tabulario/iceberg-rest:latest container_name: spark-seeds-minimal-iceberg-rest restart: unless-stopped ports: - "8181:8181" networks: - spark-seeds-minimal-net environment: - AWS_ACCESS_KEY_ID=admin - AWS_SECRET_ACCESS_KEY=password - AWS_REGION=us-east-1 - CATALOG_WAREHOUSE=s3://warehouse/ - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO - CATALOG_S3_ENDPOINT=http://minio:9000 depends_on: minio: condition: service_healthy networks: spark-seeds-minimal-net: driver: bridge volumes: minio_data: ``` {% endcode %} Start the Spark services: ```bash docker compose -f docker/docker-compose.spark.yml up -d ``` Verify Vulcan through the CLI container: ```bash docker compose -f docker/docker-compose.spark.yml run --rm vulcan-cli vulcan --version ``` Full reference: [Connect engine → Spark](/build/stage-2-productize/connect-to-engine/spark.md). {% endtab %} {% tab title="Trino" %} **Use an existing Trino cluster:** Use an existing Trino cluster with a configured catalog. No local Docker service is needed for Trino. Set the password only if your cluster requires it, then: {% tabs %} {% tab title="Mac/Linux" %} ```bash export TRINO_PASSWORD='your_password' ``` {% endtab %} {% tab title="Windows" %} ```powershell $env:TRINO_PASSWORD = 'your_password' ``` {% endtab %} {% endtabs %} Use this connection in `config.yaml`: ```yaml gateways: default: connection: type: trino host: your_trino_host port: 8080 user: your_user catalog: your_catalog http_scheme: https password: "{{ env_var('TRINO_PASSWORD') }}" model_defaults: dialect: trino ``` Full reference: [Connect engine → Trino](/build/stage-2-productize/connect-to-engine/trino.md). {% endtab %} {% endtabs %} ## 6. Verify ```bash vulcan info # check if the connection is successful ``` A successful `vulcan info` confirms the project is ready to configure and build. ## Troubleshooting

... is not a supported wheel on this platform

Your Python is outside `>=3.9, <3.11`. Recreate the environment with Python 3.10: ```sh deactivate && rm -rf .venv python3.10 -m venv .venv && source .venv/bin/activate ```

zsh: no matches found

Your shell is interpreting the `[engine]` brackets. Quote the wheel path: `pip install "./vulcan-...whl[snowflake]"`.

Dependency conflicts

Install into a fresh virtual environment, never the system Python. To overwrite an existing install, use `pip install --force-reinstall "./vulcan--py3-none-any.whl"`.

--- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://v2.dataos.info/build/readme/ldk-setup.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.