Setting Properties to Allow Incremental Rebuilds of Aggregates
You can enable incremental rebuilds of aggregates for your models.
Before you begin
Ensure that you understand how incremental builds work by reading these topics:
About this task
Allowing incremental rebuilds of aggregates does not cause rebuilds to be incremental. After an aggregate is defined, the AtScale engine determines the type of rebuild it will use for the instances of the aggregate. The first instance of an aggregate is built with a full build. Rebuilds occur to refresh aggregates with new or changed data. After the AtScale engine determines which type of rebuild to use for an aggregate, it always uses that type of rebuild unless the model is edited and redeployed. After redeploying, the AtScale engine again determines which type of rebuild to use for a defined aggregate. The choice that the engine makes can differ from the choice it made for the previous version of the model.
Procedure
To configure incremental aggregate rebuilds:
-
In Design Center, locate the dataset you want to enable incremental builds for, and open it for editing.
-
In the Dataset properties panel, enable the Allow incremental builds option.
-
In the Incremental indicator field, select the column to use as the incremental indicator.
The dropdown list shows available columns with the following data types: Long, Integer, Timestamp, DateTime, or Decimal (38,0) (Snowflake only). The column you select must have values that increase monotonically, such as a numeric UNIX timestamp showing seconds since epoch, or a Timestamp/DateTime. The values in this column enable the query engine both to append rows to an aggregate table and update rows during an incremental rebuild.
noteIf the dataset does not contain a column that meets this criteria, you may have to create a calculated column to transform another column into a supported data type.
-
In the Grace period field, specify the grace period.
When the AtScale engine starts an incremental build, the grace period determines how far back in time the engine looks for updates to rows in the dataset; for example, one week or 15 days.
The value you provide should be an integer, followed by the time unit. The time unit can be any of the following: s (second), m (minute), h (hour), d (day), w (week).
For example, setting the value to '100s' sets the grace period to 100 seconds. Setting it to '1w' sets the grace period to one week.
-
Click Apply.
-
Deploy the project.
You must be sure that any dimensions joined to the dataset rarely, if ever, change outside of the grace period. When the AtScale engine performs an incremental rebuild, it does not search through existing rows in the aggregate to replace old values or include newly appended values. As more changes are made to a dimension outside of the grace period, the aggregate instance becomes less accurate.
If changes are made outside of the grace period, you should trigger a full rebuild of the aggregate table. You can do this from the deployed model's Build Tab; for more information, see Performing Full Rebuilds of Incremental Aggregates. However, if changes are made to joined dimensions within the grace period, there is no need for a full rebuild of the aggregate.
For full details on enabling incremental rebuilds of aggregates that use joins, see Aggregates for Fact Datasets that Use Joins.
Example
You must follow these steps if you want the AtScale engine to perform incremental aggregate rebuilds for joined dimensions.
Suppose you have a model with four dimensions: Customer, Date, Order, and Product. The datasets these dimensions are based on have the following settings for the immutable
property in SML:
Dimension | Immutable |
---|---|
Customer | True |
Date | True |
Order | False |
Product | False |
For more information on the immutable
property, refer to the AtScale SML Object Documentation on GitHub.
After you deploy the model and queries start to run against it, the AtScale engine defines the following aggregates and creates instances of them:
Name of Aggregate | Included Dimensions | Possible Types of Rebuild |
---|---|---|
Agg A | Customer | Full, Incremental |
Agg B | Order | Full |
Agg C | Customer, Date | Full, Incremental |
Agg D | Order, Product | Full |
Agg E | Customer, Order | Full |
Agg F | Customer, Date, Order, Product | Full |
The engine will not perform incremental rebuilds if the following conditions are true:
- A proposed aggregate definition contains a join to at least one dimension that is not declared safe for incremental rebuilds.
- The
AGGREGATE.CREATE.JOINS.ALLOWPREVENTINCREMENTAL.ENABLED
global setting is set toTrue
.
In this case, a full rebuild is used whether the aggregate is defined by the AtScale engine or a data modeler.
If AGGREGATE.CREATE.JOINS.ALLOWPREVENTINCREMENTAL.ENABLED
is set to False
, then the aggregate table will use incremental rebuilds, but will not join to dimensions that are unsafe for incremental rebuilds.