> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/build/readme/repository-setup.md).

# Repository Setup

Vulcan projects live in a Git repository. When you deploy a data product, DataOS reads a deployment manifest (a YAML file with `type: vulcan` ) that points to that repository.

To generate the deployment manifest, move to the project root and run:

```bash
cd my-vulcan-project
vulcan create_deploy_yaml # to generate a DataOS Vulcan resource deploy YAML
```

Expected output:

```
Generated deploy YAML: /Users/johndoe/Documents/data-product/my-vulcan-project/my-vulcan-project-deploy.yaml
```

<details>

<summary>Generated deploy YAML</summary>

{% code overflow="wrap" expandable="true" %}

```yaml
version: v1alpha
type: vulcan
name: my-vulcan-project
description: "my-vulcan-project Vulcan project."
tags:
  -
spec:
  runAsUser: <Optional> # only add if you want to run as a specific user
  compute: general-purpose-shared # check available computes using dataos CLI
  engine: snowflake
  repo:
    url: <git repo url>
    syncFlags:
      - '--ref=main' # change to the branch you want to deploy
      - '--submodules=off' # change if you want to sync submodules
    baseDir: <repo-name>/<path-to-the-project>
    secret: default.<secret-name> # optional: if repo is private, create tenant secret and use it here
  use: # Use this section if you are using the connection of source type without depot
    projection:
      secrets:
        - id: default.<secret-name>
          contextAlias: <secret-alias>
      projections:
        envVars:
          - key: <PROJECT_ENV_VAR_NAME>
            template: "{{ secrets[<secret-alias>].<SECRET_KEY> | base64_decode }}"

  workflow:
    schedule:
      crons:
        - '0 0 * * *'
      endOn: '<end-date-in-ISO-8601-format>' # '2027-01-01T00:00:00-00:00'
      timezone: '<timezone>' # 'US/Pacific'
      concurrencyPolicy: Forbid

    logLevel: INFO #default
    resource: #optional
      request:
        cpu: "<cpu-request>" # '200m'
        memory: "<memory-request>" # '512Mi'
      limit:
        cpu: "<cpu-limit>" # '1000m'
        memory: "<memory-limit>" # '1Gi'


    plan:
      command:
        - vulcan
      arguments:
        - plan
        - --auto-apply
    run:
      command:
        - vulcan
      arguments:
        - run

  api:
    replicas: 1
    logLevel: INFO
    resource:
      request:
        cpu: "<cpu-request>" # '200m'
        memory: "<memory-request>" # '512Mi'
      limit:
        cpu: "<cpu-limit>" # '1000m'
        memory: "<memory-limit>" # '1Gi'


```

{% endcode %}

</details>

DataOS pulls your project files at runtime using a process called git-sync.

To do this securely, you need to:

1. Store your Git credentials as a **Secret** inside DataOS.
2. Reference that Secret in the `repo` block of your deployment manifest.

***

## Before you begin

* The DataOS CLI is installed. See [CLI Setup](/build/readme/cli-setup.md).
* Your repository is hosted on one of the supported providers: **GitHub**, **Bitbucket**, or **AWS CodeCommit**.
* You have a **personal access token** (or app password for Bitbucket) with at least read access to the repository.

## Create a Git credentials Secret

### Write the Secret manifest

Create a file named `git-sync-secret.yml`:

```yaml
name: git-sync
version: v2alpha
type: secret
workspace: system
layer: user
description: "Secret for git-sync authentication"
secret:
  type: key-value
  data:
    GITSYNC_USERNAME: "<your-git-username>"
    GITSYNC_PASSWORD: "<your-personal-access-token>"
```

Replace the placeholders:

| Placeholder                    | What to put here                                              |
| ------------------------------ | ------------------------------------------------------------- |
| `<your-git-username>`          | Your Git account username (e.g. `johndoe`)                    |
| `<your-personal-access-token>` | A personal access token or app password with repo read access |

**Provider-specific notes:**

* **GitHub:** Generate a token at **Settings > Developer settings > Personal access tokens**. Grant `repo` (read) scope.
* **Bitbucket:** Generate an app password at **Personal settings > App passwords**. Grant `Repositories: Read` permission.
* **AWS CodeCommit:** Use your IAM username and an HTTPS Git credential generated in the AWS IAM console.

### Apply the Secret

```sh
dataos-ctl resource apply -f git-sync-secret.yml
```

### Verify the Secret

```sh
dataos-ctl resource get -t secret -n git-sync
```

> **Do not commit this file.** Once applied, the credentials are stored encrypted inside DataOS. Add `git-sync-secret.yml` to your `.gitignore`.

## Best practices

* Use a **dedicated service account** or bot user for `GITSYNC_USERNAME` rather than a personal account. This prevents access from breaking if team members leave.
* **Scope the token narrowly**: read-only access to the specific repository is sufficient. Never grant write access unless your workflow requires pushing back to the repo.
* **Rotate tokens regularly** and update the Secret by re-applying the manifest:

  ```sh
  dataos-ctl resource apply -f git-sync-secret.yml
  ```
* Keep `git-sync-secret.yml` out of version control. Add it to `.gitignore`:

  ```shellscript
  git-sync-secret.yml
  ```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/build/readme/repository-setup.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
