Skip to main content

Exploring Data and Metrics Engineering

AI-Link offers users a detailed view of the folders and hierarchies as well as the numeric and categorical features available in a Semantic Model before they dive into further analysis. This consistent view of a data model ensures that users across different roles (e.g., business analysts and data scientists) are all reasoning about the same data, making insights quicker and universally legible within their organization.

For each of the below workflows, we’ll assume we’ve initialized a DataModel object called my_data_model via a process like the one described in the Connect to AtScale section.

For more details on the functionality used below, please see our API Reference. If you’re working in a Jupyter Notebook, you can run a cell with any object followed by ? to access its documentation. For instance, the following cell will fetch the documentation for get_data:

from atscale.data_model import data_model

data_model.DataModel.get_data?

Semantic Model Exploration

To get started, we can use the following commands to see the features and columns already visible within the AtScale data model:

my_data_model.get_features()

my_data_model.get_column_names()

For any of the features we’ve identified via get_features, we can also get any description associated with it:

my_data_model.get_feature_description(
feature=”one_of_my_feature_names”
)

We can also see all of the folders in our data model:

my_data_model.get_folders()

Perhaps we only wanted to see the names of the model’s numeric features. We could use this:

my_data_model.get_all_numeric_feature_names()

Or the analogous functionality for categorical features:

my_data_model.get_all_categorical_feature_names()

We can run the following to see all of our data model’s hierarchies:

my_data_model.get_hierarchies()

Suppose we want to know all of the levels of a given hierarchy. In that case, we can do this:

my_data_model.get_hierarchy_levels(
hierarchy_name=”one_of_my_hierarchy_names”
)

With such a workflow, an AI-Link user can quickly familiarize themself with the structure and context of their data model.

Data Exploration

With an understanding of our semantic model, we can now explore its data. The get_data function queries a data model, returning the results in a pandas DataFrame for easy consumption in your Python environment of choice. Accessing your data via AI-Link is as simple as a single function call:

df = my_data_model.get_data(
feature_list = [
“month”,
“year”,
“average_sales”,
“total_units_sold”,
]
)

With get_data, writing complex SQL queries for data exploration is unnecessary – including cases where you’d like to filter your results by different values. get_data supports a number of filter parameters that allow you to narrow the scope of your exploration. See the following example, which only returns rows after the year 2014 where the total units sold per month was between 50 and 100:

df = my_data_model.get_data(
feature_list=[
“month”,
“year”,
“average_sales”,
“total_units_sold”,
],
filter_greater={“year”: 2014},
filter_between={“total_units_sold”: (50, 100)}
)

Since they’re DataFrames, all responses from get_data are readily explorable and visualizable with pandas, Matplotlib, and other industry-favorite Python libraries. For more on how to use get_data (and related functions like get_data_direct), please see the API Reference.

Metrics Engineering

You can use AI-Link to maintain and contribute to your semantic model and its data.

Updating feature metadata like descriptions, captions, and visibility to BI tools, for instance, is as easy as a function call:

my_data_model.update_aggregate_feature(
feature_name="one_of_my_features",
description = "here’s the new description for one_of_my_features"
)

AI-Link also allows users to contribute new information to semantic models from their Python environment, promoting a common understanding of data across an organization regardless of who generates insights. The following function, for example, creates a new secondary attribute on my_data_model:

my_data_model.create_secondary_attribute(
dataset_name = "example_dataset",
column_name = "example_column",
new_attribute_name = "example_attr",
hierarchy_name = "example_hierarchy",
level_name = "example_level"
)

For a full list of functions useful for metrics engineering, please see our API Reference.