Skip to main content

General FAQs

AI-Link is a Python package that delivers the power of AtScale’s Semantic Layer in a way that can be leveraged for data-driven projects. These can cover Data Science projects that can build on top of the Semantic Layer, as well as Data Engineering projects meant to refine and enhance the content of the Semantic Layer itself.

Designed to enable programmatic interaction with the Semantic Layer, AI-Link is not a mechanism to replicate the entire data modeling experience offered via the AtScale canvas.

AI-Link solves several common challenges faced when deriving value from data in a production environment:

  • It provides a uniform set of feature definitions. This gives data engineers, business intelligence analysts, data scientists, and compliance teams a unified understanding of your organization's data, regardless of who uses it and how.

  • It enables data professionals to stick to their tools of choice. While a BI analyst might work in platforms like Excel or Tableau, a data scientist might prefer Python-based environments like Jupyter Notebooks and libraries like pandas, scikit-learn, and PyTorch. AI-Link facilitates collaboration among all contributors to a data initiative, regardless of the tools they use.

Within AtScale, your user account will need to have the Design Center and Runtime Query User permissions.

Within your data warehouse you will need:

  • Data warehouse Read access is required for the get_data_direct function. This allows uses to query information from the Dataw arehouse beneath the Semantic Layer without the AtScale service account for query execution.

Basic requirements include a Python version no earlier than 3.9, a coding environment (e.g., a Jupyter Notebook), and an existing AtScale data model. Please refer to the Getting Started section for more detailed prerequisites.

AI-Link is designed to augment existing AtScale data models with insights (e.g., predictive models) generated by code-first data professionals. AI-Link helps businesses answer questions such as:

  • What will my sales look like next year?
  • Which of our customer cohorts will contribute the most to our revenue?
  • When should we replace the machinery on our manufacturing sites?

AI-Link is a Python library; it currently does not support any other programming languages.

Some key capabilities include:

  • A business-vetted library of features to ensure consistency across your organization.

  • Pandas and Spark DataFrame support through a Python connector to enable working with any open-source library, machine learning framework, or AutoML tool.

What distinguishes the different versions of get_data?

Our family of get_data functions allows users to fetch data from a cube by simply naming the desired features and filters – no user-side query creation necessary. Here’s how those functions differ.

  • get_data generates and executes a query against your database using the account permissions set up in your AtScale instance, returning the results in a pandas DataFrame. That is, the AtScale instance queries the database and passes the results back to the user. The data returned by this version of get_data is limited to the maximum row count specified in your AtScale Organization’s Engine settings; depending on the size of your data, you might prefer to use other varieties of this functionality.

  • get_data_direct generates and submits a query directly to the database using credentials supplied in a SQLConnection object, returning results in a pandas DataFrame. You’ll be required to enter these database credentials to construct the SQLConnection object. This option removes constraints like row limitations otherwise present in get_data return values, making it useful for large data projects.

  • get_data_spark generates and submits a query directly to the database using information native to the provided PySpark Session, returning the results in a PySpark DataFrame. This option removes constraints like row limitations otherwise present in get_data return values, making it useful for large data projects.

  • get_data_spark_jdbc generates and submits a query directly to the database using the provided PySpark Session and JDBC connection information, returning results in a PySpark DataFrame. This option removes constraints like row limitations otherwise present in get_data return values, making it useful for large data projects.