General FAQs
What is AI-Link?
AI-Link is a Python package that delivers the power of AtScale’s Semantic Layer in a way that can be leveraged for data-driven projects. These can cover Data Science projects that can build on top of the Semantic Layer, as well as Data Engineering projects meant to refine and enhance the content of the Semantic Layer itself.
Designed to enable programmatic interaction with the Semantic Layer, AI-Link is not a mechanism to replicate the entire data modeling experience offered via the AtScale canvas.
Why do I need AI-Link?
AI-Link solves several common challenges faced when deriving value from data in a production environment:
-
It provides a uniform set of feature definitions. This gives data engineers, business intelligence analysts, data scientists, and compliance teams a unified understanding of your organization's data, regardless of who uses it and how.
-
It enables data professionals to stick to their tools of choice. While a BI analyst might work in platforms like Excel or Tableau, a data scientist might prefer Python-based environments like Jupyter Notebooks and libraries like pandas, scikit-learn, and PyTorch. AI-Link facilitates collaboration among all contributors to a data initiative, regardless of the tools they use.
-
It bridges the gap between AI and BI initiatives. With AI-Link, code-first data scientists can write model predictions directly into the Semantic Layer, making them accessible via BI dashboards.
-
It accelerates data engineering and exploration. AI-Link allows users to scale and transform their features in seconds, as well as create new ones for common use cases (e.g., time series analysis).
What user permissions are necessary in AtScale and my data warehouse to use AI-Link?
Within AtScale, your user account will need to have the Design Center and Runtime Query User permissions.
Within your data warehouse you will need:
-
Access to a development or sandbox database in case you want to experiment with non-production data (see Prerequisites).
-
Read access for functionality related to pulling information from the Semantic Layer such as
get_data
, or READ operations on the Semantic Layer itsef, likeget_features
-
write access for functionality like
writeback
, Auto Semantic Model Creation, and non-UDF-based Semantic Inference.
Is there anything I need to do before I start using AI-Link?
Basic requirements include a Python version no earlier than 3.9, a coding environment (e.g., a Jupyter Notebook), and an existing AtScale data model. Please refer to the Getting Started section for more detailed prerequisites.
What sorts of use cases does AI-Link address?
AI-Link is designed to augment existing AtScale data models with insights (e.g., predictive models) generated by code-first data professionals. AI-Link helps businesses answer questions such as:
- What will my sales look like next year?
- Which of our customer cohorts will contribute the most to our revenue?
- When should we replace the machinery on our manufacturing sites?
Is AI-Link available in any other programming languages?
AI-Link is a Python library; it currently does not support any other programming languages.
What are the key capabilities of AI-Link?
Some key capabilities include:
-
A business-vetted library of features to ensure consistency across your organization.
-
Tools to scale and transform features in preparation for data science applications.
-
Tools for exploratory data analysis that operate at the warehouse level – no data movement required.
-
Pandas and Spark DataFrame support through a Python connector to enable working with any open-source library, machine learning framework, or AutoML tool.
-
Writeback for pushing model predictions and feature importance reports to source databases and/or existing AtScale data models, where they can be consumed via BI tools.
-
Semantic Inference to embed machine learning models in your Semantic Layer, flexibly generating predictions and exposing them to the BI tools of your choice.
-
Auto Semantic Model Creation to automatically generate new AtScale data models from a single DataFrame or table in your warehouse.
What distinguishes the different versions of get_data
?
Our family of get_data
functions allows users to fetch data from a cube by simply naming the desired features and filters – no user-side query creation necessary. Here’s how those functions differ.
-
get_data
generates and executes a query against your database using the account permissions set up in your AtScale instance, returning the results in a pandas DataFrame. That is, the AtScale instance queries the database and passes the results back to the user. The data returned by this version ofget_data
is limited to the maximum row count specified in your AtScale Organization’s Engine settings; depending on the size of your data, you might prefer to use other varieties of this functionality. -
get_data_direct
generates and submits a query directly to the database using credentials supplied in aSQLConnection
object, returning results in a pandas DataFrame. You’ll be required to enter these database credentials to construct theSQLConnection
object. This option removes constraints like row limitations otherwise present inget_data
return values, making it useful for large data projects. -
get_data_spark
generates and submits a query directly to the database using information native to the provided PySpark Session, returning the results in a PySpark DataFrame. This option removes constraints like row limitations otherwise present inget_data
return values, making it useful for large data projects. -
get_data_spark_jdbc
generates and submits a query directly to the database using the provided PySpark Session and JDBC connection information, returning results in a PySpark DataFrame. This option removes constraints like row limitations otherwise present inget_data
return values, making it useful for large data projects.