Skip to main content

Best Practices

What are some tips for getting started?

We recommend that the testing data model have at least one time hierarchy and one standard hierarchy. This will better illustrate the breadth of AI-Link’s capabilities for programmatically interacting with AtScale data models.

The following command installs the latest version of AI-Link:

pip install atscale --upgrade

See our warehouse extras for more on maintaining packages via pip that are specific to your data warehouse.

What if I have a really large dataset? Is there a way to sample data?

Many of our customers manage very large datasets in their warehouses. While AtScale does help optimize query performance, there is a limit to the number of rows you can fetch per query (which is configurable in your AtScale settings) and a limit to the amount of data your local kernel can handle.

To address this, we recommend sampling your data with a get_data call using the function’s filtering parameters. One way of doing this is by filtering by a categorical variable. For instance, this call returns average sales data at the month level, but only for stores in Missouri:

my_data_model.get_data(
feature_list=[“month”, “average_sales”],
filter_in={
“state”: [“Missouri”],
},
)

and this call also gets month-wise average sales data, but only after January 23rd, 2024:

my_data_model.get_data(
feature_list=[“date”, “month”, “average_sales”],
filter_greater={
“date”: “01-23-24”,
},
)