General FAQs
What is AI-Link?
AI-Link is a Python package that delivers the power of AtScale’s Semantic Layer in a way that can be leveraged for data-driven projects. These can cover Data Science projects that can build on top of the Semantic Layer, as well as Data Engineering projects meant to refine and enhance the content of the Semantic Layer itself.
Designed to enable programmatic interaction with the Semantic Layer, AI-Link is not a mechanism to replicate the entire data modeling experience offered via the AtScale canvas.
Why do I need AI-Link?
AI-Link solves several common challenges faced when deriving value from data in a production environment:
-
It provides a uniform set of feature definitions. This gives data engineers, business intelligence analysts, data scientists, and compliance teams a unified understanding of your organization's data, regardless of who uses it and how.
-
It enables data professionals to stick to their tools of choice. While a BI analyst might work in platforms like Excel or Tableau, a data scientist might prefer Python-based environments like Jupyter Notebooks and libraries like pandas, scikit-learn, and PyTorch. AI-Link facilitates collaboration among all contributors to a data initiative, regardless of the tools they use.
What user permissions are necessary in AtScale and my data warehouse to use AI-Link?
Within AtScale, your user account will need to have the Design Center and Runtime Query User permissions.
Within your data warehouse you will need:
- Data warehouse Read access is required for the
get_data_direct
function. This allows uses to query information from the Dataw arehouse beneath the Semantic Layer without the AtScale service account for query execution.
Is there anything I need to do before I start using AI-Link?
Basic requirements include a Python version no earlier than 3.9, a coding environment (e.g., a Jupyter Notebook), and an existing AtScale data model. Please refer to the Getting Started section for more detailed prerequisites.
What sorts of use cases does AI-Link address?
AI-Link is designed to augment existing AtScale data models with insights (e.g., predictive models) generated by code-first data professionals. AI-Link helps businesses answer questions such as:
- What will my sales look like next year?
- Which of our customer cohorts will contribute the most to our revenue?
- When should we replace the machinery on our manufacturing sites?
Is AI-Link available in any other programming languages?
AI-Link is a Python library; it currently does not support any other programming languages.
What are the key capabilities of AI-Link?
Some key capabilities include:
-
A business-vetted library of features to ensure consistency across your organization.
-
Pandas and Spark DataFrame support through a Python connector to enable working with any open-source library, machine learning framework, or AutoML tool.
What distinguishes the different versions of get_data
?
Our family of get_data
functions allows users to fetch data from a cube by simply naming the desired features and filters – no user-side query creation necessary. Here’s how those functions differ.
-
get_data
generates and executes a query against your database using the account permissions set up in your AtScale instance, returning the results in a pandas DataFrame. That is, the AtScale instance queries the database and passes the results back to the user. The data returned by this version ofget_data
is limited to the maximum row count specified in your AtScale Organization’s Engine settings; depending on the size of your data, you might prefer to use other varieties of this functionality. -
get_data_direct
generates and submits a query directly to the database using credentials supplied in aSQLConnection
object, returning results in a pandas DataFrame. You’ll be required to enter these database credentials to construct theSQLConnection
object. This option removes constraints like row limitations otherwise present inget_data
return values, making it useful for large data projects. -
get_data_spark
generates and submits a query directly to the database using information native to the provided PySpark Session, returning the results in a PySpark DataFrame. This option removes constraints like row limitations otherwise present inget_data
return values, making it useful for large data projects. -
get_data_spark_jdbc
generates and submits a query directly to the database using the provided PySpark Session and JDBC connection information, returning results in a PySpark DataFrame. This option removes constraints like row limitations otherwise present inget_data
return values, making it useful for large data projects.