> For the complete documentation index, see [llms.txt](https://v2.dataos.info/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://v2.dataos.info/build/stage-2-productize/explore-recipes/incremental-by-time.md).

# Incremental by time

This recipe shows how to build an `INCREMENTAL_BY_TIME_RANGE` model: the most commonly used pattern for time-series data like events, transactions, logs, and daily aggregations.

***

## Why incremental?

A `FULL` model rebuilds the entire table from scratch on every run. If you have two years of daily order data, every run processes 730 days (most of which haven't changed).

With `INCREMENTAL_BY_TIME_RANGE`, Vulcan tracks which intervals have been processed and only executes the missing or outdated ones. The result: faster, cheaper runs that scale with your data.

**Note on orders-analytics:** All models in `orders-analytics` use `kind FULL`. The demo dataset is small enough that a full rebuild every 15 minutes is fast and simple. If your order data grows to millions of rows per day, switching `silver.fct_daily_sales` to `INCREMENTAL_BY_TIME_RANGE` would be the right move. The pattern below shows exactly how to do that.

***

## The pattern

Two requirements make incremental-by-time work:

1. **Declare a `time_column`** in the `MODEL` block: the column that represents when each row occurred.
2. **Filter the input using `@start_ds`/`@end_ds`** (or `@start_dt`/`@end_dt` for timestamps): Vulcan substitutes the correct values for each interval.

Here is what `silver.fct_daily_sales` would look like if converted from `FULL` to `INCREMENTAL_BY_TIME_RANGE`:

```sql
MODEL (
  name silver.fct_daily_sales,
  kind INCREMENTAL_BY_TIME_RANGE (
    time_column order_date
  ),
  start '2025-01-01',
  cron '*/15 * * * *',
  grains [order_date, region_id, customer_id, product_id],
  assertions (
    not_null(columns := (order_date, region_id, region_name, customer_id, product_id, category)),
    forall(criteria := (total_orders >= 0, total_revenue >= 0))
  )
);

WITH order_metrics AS (
  SELECT
    o.order_date::DATE AS order_date,
    c.region_id,
    r.region_name,
    o.customer_id,
    oi.product_id,
    p.category,
    COUNT(DISTINCT o.order_id)::INTEGER AS total_orders,
    SUM(oi.quantity * oi.unit_price)::NUMERIC(15, 2) AS total_revenue
  FROM bronze.orders AS o
  INNER JOIN bronze.customers AS c ON o.customer_id = c.customer_id
  INNER JOIN bronze.regions AS r ON c.region_id = r.region_id
  INNER JOIN bronze.order_items AS oi ON o.order_id = oi.order_id
  INNER JOIN bronze.products AS p ON oi.product_id = p.product_id
  WHERE @revenue_order_filter(o.order_status)
    AND o.order_date BETWEEN @start_ds AND @end_ds
  GROUP BY o.order_date::DATE, c.region_id, r.region_name, o.customer_id, oi.product_id, p.category
)
SELECT ...
FROM order_metrics;
```

The key addition is `AND o.order_date BETWEEN @start_ds AND @end_ds` in the `WHERE` clause. Vulcan substitutes `@start_ds` and `@end_ds` with the actual date strings for the interval being processed. On a 15-minute cron, this processes a rolling window of recent data on each run.

***

## What Vulcan does under the hood

When you run `vulcan plan dev`:

1. Vulcan checks the state store to find which intervals have already been processed.
2. For intervals that are missing, it substitutes `@start_ds`/`@end_ds` with the actual date values.
3. It DELETEs existing rows in the target interval (to handle late-arriving data), then INSERTs the new rows.
4. The state store is updated to mark the interval as complete.

For Snowflake and Postgres, this is: **DELETE by time range, then INSERT**. For Spark, this is: **INSERT OVERWRITE by time column partition**.

***

## Handling late-arriving data

Some events arrive late (an order created on Monday shows up in the source on Wednesday). Use `lookback` to reprocess the last N intervals on every run, so late arrivals are captured:

```sql
MODEL (
  name analytics.daily_sales,
  kind INCREMENTAL_BY_TIME_RANGE (
    time_column order_date,
    lookback 2
  ),
  start '2024-01-01',
  cron '@daily'
);
```

With `lookback 2` and a daily cron, every run processes today, yesterday, and the day before yesterday. This ensures late data arriving within 2 days is always captured.

***

## Using timestamps instead of dates

Use `@start_dt`/`@end_dt` when your time column is a `TIMESTAMP` rather than a `DATE`:

```sql
MODEL (
  name analytics.events,
  kind INCREMENTAL_BY_TIME_RANGE (
    time_column event_timestamp
  ),
  start '2024-01-01',
  cron '@hourly'
);

SELECT
  event_id,
  user_id,
  event_type,
  event_timestamp
FROM raw.raw_events
WHERE event_timestamp BETWEEN @start_dt AND @end_dt
```

**Important:** Your `time_column` should be in UTC timezone. Vulcan's scheduler and time macros work in UTC.

***

## Partitioning for performance

`INCREMENTAL_BY_TIME_RANGE` automatically adds the `time_column` to the partition key. This lets the warehouse skip irrelevant partitions, making downstream queries faster:

```sql
-- Vulcan automatically adds order_date to the partition key.
-- To partition by an additional column too:
MODEL (
  name analytics.daily_sales,
  kind INCREMENTAL_BY_TIME_RANGE (
    time_column order_date
  ),
  partitioned_by ARRAY[order_date, region_id],
  start '2024-01-01',
  cron '@daily'
);
```

To disable automatic time column partitioning (and control it entirely yourself):

```sql
kind INCREMENTAL_BY_TIME_RANGE (
  time_column order_date,
  partition_by_time_column false
),
partitioned_by (region_id)
```

***

## Backfilling historical data

To backfill historical data after first deploying the model, use `--restate-model`:

```bash
vulcan plan prod \
  --restate-model analytics.daily_sales \
  --start 2023-01-01 \
  --end 2024-01-01 \
  --auto-apply
```

This processes the entire specified range, deleting and reinserting each interval.

***

## Common mistakes

| Mistake                                           | Symptom                                                  | Fix                                                                                          |
| ------------------------------------------------- | -------------------------------------------------------- | -------------------------------------------------------------------------------------------- |
| Missing `WHERE` clause with `@start_ds`/`@end_ds` | Model processes all data on every run                    | Add the `WHERE` filter to restrict inputs to the interval                                    |
| `time_column` in non-UTC timezone                 | Wrong interval boundaries                                | Convert to UTC at ingestion; use `cron_tz` only for scheduling, not for interval calculation |
| `@start_ds` with timestamp column                 | Incorrect date-string comparison                         | Use `@start_dt`/`@end_dt` for timestamp columns                                              |
| Too-strict assertion on incremental               | Assertion checks old data outside the processed interval | Use `@this_model` in assertion queries: Vulcan adds the time range filter automatically      |

***

## Python equivalent

```python
from vulcan import ExecutionContext, model, ModelKindName

@model(
    "analytics.daily_sales_py",
    columns={
        "order_date": "date",
        "product_id": "int",
        "total_sales": "decimal(10,2)",
    },
    kind=dict(
        name=ModelKindName.INCREMENTAL_BY_TIME_RANGE,
        time_column="order_date",
    ),
    grains=["order_date", "product_id"],
    depends_on=["analytics.orders", "analytics.order_items", "analytics.products"],
    start="2024-01-01",
    cron="@daily",
)
def execute(context: ExecutionContext, start, end, **kwargs):
    query = f"""
    SELECT
      o.order_date,
      p.product_id,
      SUM(oi.quantity * oi.unit_price) AS total_sales
    FROM analytics.orders o
    JOIN analytics.order_items oi ON o.order_id = oi.order_id
    JOIN analytics.products p ON oi.product_id = p.product_id
    WHERE o.order_date BETWEEN '{start}' AND '{end}'
    GROUP BY o.order_date, p.product_id
    """
    return context.fetchdf(query)
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://v2.dataos.info/build/stage-2-productize/explore-recipes/incremental-by-time.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
