How Aggregate Tables Are Populated With Data
After an aggregate table is defined, an instance of it needs to be created. An instance is an actual table that contains the aggregated data. The process of creating an instance of an aggregate table and populating it is referred to as a build of the table.
Consider that for an aggregate instance to be built there should be an aggregate definition for it. An aggregate instance can be built and used only after the corresponding System-Defined or User-Defined aggregate table definition is created. If you want to see what definitions you have in AtScale you can check the Definitions tab on the Aggregates page.
For more information about how aggregates are defined, built, and used, see About Aggregates, Aggregate Definitions and Aggregate Instances, Life Cycle of Aggregate Tables, and Defining Aggregates Yourself.
Updating aggregate tables
Aggregate-table instances become outdated as new data is ingested and existing data is updated. There are three actions that the engine can take to bring aggregate tables up to date:
-
Supersede the current instance of an aggregate table with an entirely new instance. The new instance includes more recent data and uses the existing definition for the aggregate table. This is called a full rebuild of an aggregate table.
The previous instance stays in place as a backup in case the building of the new instance fails. Queries continue to use the previous instance until the new instance is ready.
The AtScale engine rebuilds aggregate instances by querying the fact tables and dimensions that the definitions of those aggregates are based on. However, if the engine setting
AGGREGATES.CREATE.BUILDFROMEXISTING.ENABLED
is set toTrue
and the engine decides that an aggregate instance A can be rebuilt from data that is in the rebuilt instance B of a different aggregate table, then the engine runs a query on B to populate A. Such a query generally runs much faster than a query on fact tables and dimensions. When rebuilds are triggered for all of the aggregate tables for a model, the AtScale engine determines which aggregates could be built from other aggregates. The latter are given higher priority in the order of the rebuilds, so that the former can be rebuilt from them.
Instances of incremental aggregates are not rebuilt from the data that is in instances of other aggregate tables.
- Stop using the current system definition for an aggregate table and create no new instance of the table. Instead, create an instance of a newly defined aggregate table that supersedes the previous definition. The new instance includes the most recent data.
- Supersede the current instance of an aggregate table by a) duplicating that instance, and then b) updating the rows in the duplicate and appending new rows to it. This is called incremental rebuild of an aggregate table. For more details about this type or rebuild, see About Incremental Rebuilds.
By default, all of the aggregate tables for a model are refreshed with full rebuilds. You can set an SML property on a model's fact dataset that causes all rebuilds to be incremental rebuilds. If your model has more than one fact dataset, the aggregate tables will be built incrementally only for those fact datasets on which the flag is set.
Aggregate table instances are not built or rebuilt individually. Rather, all aggregate table instances for a model are built or rebuilt together in a batch build process. A process builds or rebuilds one aggregate at a time in succession. You can use engine settings to influence the order in which aggregate instances are processed.
How Batch Build Processes For Models Are Started
Many companies have ETL (extract, transform, load) jobs that they routinely run to cleanse raw data and load it into a data warehouse. Before it ends, a job can call one of AtScale's APIs to trigger a build process. Alternatively, if a job is configured to write to a log file when it finishes, you can have AtScale monitor a path in a distributed file system for this signal that the job is complete. AtScale can then start a build process.
If you do not run routine ETL jobs, or you do not want to wait for one to run, you can click a button in Design Center to start a build process.
Automatic restarts of interrupted batch-build processes
As an aggregate instance for a model is built for the first time or rebuilt, the build process assigns it a status. If a build process is interrupted by the AtScale engine stopping, restarting the engine restarts the build process automatically. The status of an aggregate instance at the time of the interruption determines what the process does with that instance when the process restarts.
- If the status of an instance is New, the instance has not yet been created and the process creates it.
- If the status of an instance is In Progress, the process restarts work on the instance.
- If the status of an instance is Done, the process ignores the instance.
- If the status of an instance is Failed, the process attempts again to create the instance, if the value of the engine setting AGGREGATES.BATCH.RETRY.MAXATTEMPTSPERAGGREGATE has not yet been reached for that instance.
If the build process completes successfully, the status of all aggregate instances that the process created is set to Active.