Setting Properties to Allow Incremental Rebuilds of Aggregates
This section describes how to enable incremental rebuilds for your aggregates.
Prerequisites
Ensure that you understand how incremental builds work by reading these topics:
Also be aware of the following:
-
You must be sure that any dimensions joined to the dataset you're enabling incremental aggregates for rarely, if ever, change outside of the grace period. When the AtScale engine performs an incremental rebuild, it does not search through existing rows in the aggregate to replace old values or include newly appended values. As more changes are made to a dimension outside of the grace period, the aggregate instance becomes less accurate.
If changes are made outside of the grace period, you should trigger a full rebuild of the aggregate table. You can do this from the deployed model's Build Tab; for more information, see Performing Full Rebuilds of Incremental Aggregates. However, if changes are made to joined dimensions within the grace period, there is no need for a full rebuild of the aggregate.
For full details on enabling incremental rebuilds of aggregates that use joins, see Aggregates for Fact Datasets that Use Joins.
-
If your incremental aggregates include metrics from multiple fact tables, those metrics must have Override Default Handling enabled and set to Include repeated values in query results. For more information, see About Queries on Dimensions that are Unrelated to One or More Queried Metrics.
About this task
Allowing incremental rebuilds of aggregates does not cause rebuilds to be incremental. After an aggregate is defined, the AtScale engine determines the type of rebuild it will use for the instances of the aggregate. The first instance of an aggregate is built with a full build. Rebuilds occur to refresh aggregates with new or changed data. After the AtScale engine determines which type of rebuild to use for an aggregate, it always uses that type of rebuild unless the model is edited and redeployed. After redeploying, the AtScale engine again determines which type of rebuild to use for a defined aggregate. The choice that the engine makes can differ from the choice it made for the previous version of the model.
Enable view-based incremental aggregate rebuilds
To configure AtScale's standard, view-based incremental aggregate rebuilds:
-
In Design Center, locate the dataset you want to enable incremental builds for, and open it for editing.
-
In the Dataset properties panel, enable the Allow incremental builds option.
-
In the Incremental indicator field, select the column to use as the incremental indicator.
The dropdown list shows available columns with the following data types: Long, Integer, Timestamp, DateTime, or Decimal (38,0) (Snowflake only). The column you select must have values that increase monotonically, such as a numeric UNIX timestamp showing seconds since epoch, or a Timestamp/DateTime. The values in this column enable the query engine both to append rows to an aggregate table and update rows during an incremental rebuild.
NoteIf the dataset does not contain a column that meets this criteria, you may have to create a calculated column to transform another column into a supported data type.
-
In the Grace period field, specify the grace period.
When the AtScale engine starts an incremental build, the grace period determines how far back in time the engine looks for updates to rows in the dataset; for example, one week or 15 days.
The value you provide should be an integer, followed by the time unit. The time unit can be any of the following:
s(second),m(minute),h(hour),d(day),w(week).For example, setting the value to
100ssets the grace period to 100 seconds. Setting it to1wsets the grace period to one week. -
Click Apply.
-
Deploy the catalog.
Enable partition-based incremental aggregate rebuilds
Partition-based incremental aggregates are a Public Preview feature.
Although AtScale's standard, view-based incremental aggregate functionality accelerates aggregate build times, it also adds a significant run-time penalty to report queries that select from incrementally-built aggregate tables.
If you have encountered this issue, you may want to enable AtScale's partition-based incremental aggregates. This functionality leverages the data warehouse's native partition (or cluster block) pruning capabilities when building and querying incremental aggregates, thereby reducing the run-time query overhead observed in the view-based implementation.
Partition-based incremental agggregates are only supported for the following data warehouse platforms: Snowflake, Databricks, BigQuery.
The following sections describe how to configure AtScale to perform partition-based incremental aggregate builds.
Configure the partition strategy
First, you must configure the system to build partition-based incremental aggregates and select the partition strategy you want it to use. These are controlled by the aggregate.incrementalUpdate.partition.strategy setting, which is available at both the global and model levels. This setting supports the following values:
none: Default. AtScale does not build partition-based incremental aggregates. View-based incremental aggregates are built instead.distribution: The system builds partition-based incremental aggregates and uses statistics to determine the partition key.user: The system builds partition-based incremental aggregates. The user defines the partition key.indicator: The system builds partition-based incremental aggregates. The incremental indicator is used as the partition key.
For more information on each partition strategy, see About Incremental Rebuilds.
When you switch from view-based incremental aggregates to partition-based, AtScale deactivates all existing incrementally-built aggregates and builds new ones. Depending on the data sizes involved, this may take a significant amount of time and should be planned for. The shorter build time of partition-based incremental aggregates will be achieved on subsequent builds.
Configure partition-based incremental builds in your model
Next, you must configure your models to support partition-based incremental aggregate builds:
-
In Design Center, locate the dataset you want to enable incremental builds for, and open it for editing.
-
In the Dataset properties panel, enable the Allow incremental builds option.
-
In the Incremental indicator field, select the column to use as the incremental indicator. If you are using the
distributionoruserpartition strategy, this column should reference a dimension.Known IssueThere is a known issue with partition-based incremental aggregate builds that arises when the incremental indicator column is a number type, such as Long, Decimal(38,0), or BigInteger. When using such a type, the incremental build routine does not correctly convert the grace period time offset and rebuilds more records than specified by the grace period parameter. This issue does not affect models that use an incremental indicator column of a date-like type, such as Date, Timestamp, or DateTime.
The dropdown list shows available columns with the following data types: Long, Integer, Timestamp, DateTime, or Decimal (38,0) (Snowflake only). The column you select must have values that increase monotonically, such as a numeric UNIX timestamp showing seconds since epoch, or a Timestamp/DateTime. The values in this column enable the query engine both to append rows to an aggregate table and update rows during an incremental rebuild.
NoteIf the dataset does not contain a column that meets this criteria, you may have to create a calculated column to transform another column into a supported data type.
-
In the Grace period field, specify the grace period.
When the AtScale engine starts an incremental build, the grace period determines how far back in time the engine looks for updates to rows in the dataset; for example, one week or 15 days.
The value you provide should be an integer, followed by the time unit. The time unit can be any of the following:
s(second),m(minute),h(hour),d(day),w(week). For example, setting the value to100ssets the grace period to 100 seconds. Setting it to1wsets the grace period to one week. -
Click Apply.
-
If you are using the
userpartition strategy, define partition keys for your aggregates:- Open your model and switch to the Code Editor tab.
- For UDAs, ensure the
aggregates>attributesproperty contains thepartition: keyelement. For more information, refer to the AtScale SML Object Documentation on GitHub. - For system-defined aggregates, ensure the model file contains the
partitionsproperty. For more information, refer to the AtScale SML Object Documentation on GitHub. - Click Save File.
-
Deploy the catalog.
Example
You must follow these steps if you want the AtScale engine to perform incremental aggregate rebuilds for joined dimensions.
Suppose you have a model with four dimensions: Customer, Date, Order, and Product. The datasets these dimensions are based on have the following settings for the immutable property in SML:
| Dimension | Immutable |
|---|---|
| Customer | True |
| Date | True |
| Order | False |
| Product | False |
For more information on the immutable property, refer to the AtScale SML Object Documentation on GitHub.
After you deploy the model and queries start to run against it, the AtScale engine defines the following aggregates and creates instances of them:
| Name of Aggregate | Included Dimensions | Possible Types of Rebuild |
|---|---|---|
| Agg A | Customer | Full, Incremental |
| Agg B | Order | Full |
| Agg C | Customer, Date | Full, Incremental |
| Agg D | Order, Product | Full |
| Agg E | Customer, Order | Full |
| Agg F | Customer, Date, Order, Product | Full |
The engine will not perform incremental rebuilds if the following conditions are true:
- A proposed aggregate definition contains a join to at least one dimension that is not declared safe for incremental rebuilds.
- The
aggregate.create.joins.allowPreventIncremental.enabledglobal setting is set totrue.
In this case, a full rebuild is used whether the aggregate is defined by the AtScale engine or a data modeler.
If aggregate.create.joins.allowPreventIncremental.enabled is set to false, then the aggregate table will use incremental rebuilds, but will not join to dimensions that are unsafe for incremental rebuilds.