Exploring Data
AI-Link offers users a detailed view of the folders and hierarchies as well as the numeric and categorical features available in a Semantic Model before they dive into further analysis. This consistent view of a data model ensures that users across different roles (e.g., business analysts and data scientists) are all reasoning about the same data, making insights quicker and universally legible within their organization.
For each of the below workflows, we’ll assume we’ve initialized a DataModel
object called my_data_model
via a process like the one described in the Connect to AtScale section.
For more details on the functionality used below, please see our API Reference. If you’re working in a Jupyter Notebook, you can run a cell with any object followed by ?
to access its documentation. For instance, the following cell will fetch the documentation for get_data
:
from atscale.data_model import data_model
data_model.DataModel.get_data?
Semantic Model Exploration
To get started, we can use the following commands to see the features and columns already visible within the AtScale data model:
my_data_model.get_features()
my_data_model.get_column_names()
For any of the features we’ve identified via get_features, we can also get any description associated with it:
my_data_model.get_feature_description(
feature=”one_of_my_feature_names”
)
We can also see all of the folders in our data model:
my_data_model.get_folders()
Perhaps we only wanted to see the names of the model’s numeric features. We could use this:
my_data_model.get_all_numeric_feature_names()
Or the analogous functionality for categorical features:
my_data_model.get_all_categorical_feature_names()
We can run the following to see all of our data model’s hierarchies:
my_data_model.get_hierarchies()
Suppose we want to know all of the levels of a given hierarchy. In that case, we can do this:
my_data_model.get_hierarchy_levels(
hierarchy_name=”one_of_my_hierarchy_names”
)
With such a workflow, an AI-Link user can quickly familiarize themself with the structure and context of their data model.
Data Exploration
With an understanding of our semantic model, we can now explore its data. The get_data
function queries a data model, returning the results in a pandas DataFrame for easy consumption in your Python environment of choice. Accessing your data via AI-Link is as simple as a single function call:
df = my_data_model.get_data(
feature_list = [
“month”,
“year”,
“average_sales”,
“total_units_sold”,
]
)
With get_data
, writing complex SQL queries for data exploration is unnecessary – including cases where you’d like to filter your results by different values. get_data
supports a number of filter parameters that allow you to narrow the scope of your exploration. See the following example, which only returns rows after the year 2014 where the total units sold per month was between 50 and 100:
df = my_data_model.get_data(
feature_list=[
“month”,
“year”,
“average_sales”,
“total_units_sold”,
],
filter_greater={“year”: 2014},
filter_between={“total_units_sold”: (50, 100)}
)
Since they’re DataFrames, all responses from get_data
are readily explorable and visualizable with pandas, Matplotlib, and other industry-favorite Python libraries. For more on how to use get_data
(and related functions like get_data_direct
), please see the API Reference.