Skip to main content

2.3.0 release notes

This release of AI-Link introduces new optimized operations for programmatic interaction with AtScale and expands existing beta functionality:

  • Auto Semantic Model Creation (Beta) ability to create generic time dimensions
  • Semantic Inference (Beta) for Databricks and testing to support strategic EDWs (Snowflake, GBQ, Databricks)

Please refer to our API documentation for the latest syntax to use with AI-Link. See below for updates associated with this release.

Auto Semantic Model Creation (Beta) Update

Date/Time Dimension Creation Support

Generate a standard dimension for years down to days that can a shareable object across data models in AtScale. Auto Semantic Model Creation will interpret the selected dataset and automatically create a default table for dates from a materialized date/time table from 1900-2100 with the columns representing the date granularity (year, month, day) levels that will generate a reusable hierarchy in AtScale. Users will have an option to not write this to their db in case they do not have write permissions. This will help users establish foundational, standardized date dimensions with improved query performance and roleplaying for subsequent data modelling activities.

Semantic Inference (Beta) Updates

Databricks UDF Support: Semantic Inference with dbSQL* and Databricks SQL UDFs; users can now package UDFs with data transformation queries directly into AtScale for on-demand execution from connected BI tools. Users will no longer need to remove data from Databricks to support large scale data analysis and transformation of data models.

* Note: Python version of this is still in private preview on Databricks, this capability will be updated to work with those UDFs for ML model embed.

New Python Helper Functions for Programmatic Interaction

  • Bulk Operations: reduce the number of endpoint checks to improve the performance of accessing or updating large quantities of data. This makes creating aggregate features, calculated features, and calculated columns much easier for python developers - data engineers and data scientists - to build key components of AtScale models for their use-cases based on large tabular datasets. This will help accelerate query speed and strengthens functions for big data semantic/ML modeling.
  • get_database_query: creates and exposes a view of an optimized SQL query from AtScale translated from our python function get_data_direct for data analysis and data engineering use-cases. User can then execute this in their target environment of choice. This helps data scientists and data engineers create an optimized SQL query that then can be tailored for various data transformation activities. (Additional details in changelog below)
  • Perspective creation and deletion: allows user to better control their get_data calls to target specific elements of the data model that are of interest. This maximizes the potential for existing data models to purpose different analytics initiatives.

Non-Functional Updates

  • Support for Snowpark Dataframes so AtScale can work with our customers' databases and strategic tools (e.g. Snowpark)
  • UX Quality of Life Improvements: informed function execution to simplify pythonic dialog for users
  • Checks/Validations
  • Documentation Site re-format and README file updates based on the latest code manifest; makes it easier to find key information to get started quickly.

Changelog for Syntax Updates

client.py::Client

Updated Functions:

  • auto_gen_semantic_model: Added parameter generate_date_table to function to create a materialized date table in the source db, must be set to False to use previous behavior

connection.py::Connection

Updated Functions:

  • _init_: The server parameter now removes trailing '/' to avoid an error from a copied Chrome url. Connection requests are now made through a persisted session in order to save time with network communication

date_model.py::DataModel

New Functions:

  • get_database_query: Returns a database query generated using the AtScale model to get the given feature
  • create_perspective: Creates a perspective that hides the inputs, using the current data model as the base
  • delete: Deletes the current data model from the project
  • bulk_operator: Performs a specified operation for all provided parameters. Optimizes validation and API calls for better performance on large numbers of operations. Must be chain operations of the same underlying function, for example create_aggregate_feature.

Renamed:

  • add_queried_dataset will now be add_query_dataset to be more accurate to the function operation. This now takes an additional optional parameter create_hinted_aggregate to generate an aggregate table with all measures and join keys in the QDS dataset to improve query performance

  • join_table will now be add_table to be more accurate to function operation

  • dataset_name parameter has been renamed to fact_dataset_name in the following functions

    • DataModel.create_denormalized_categorical_feature
    • DataModel.create_aggregate_feature

Optional New Parameters:

  • get_features: Access features from the draft project by setting the use_published parameter to 'False'. The dict returned by get_features now takes an atscale_type parameter which maps to the values returned by data_type, data_type now returns the python data type

  • join_table: The join_features parameter is now optional, for instances in which a user would like to add a table to a data model with no joins

  • get_hierarchies: The default value for parameter secondary_attribute has changed so the default behavior does not include secondary attributes in the response. This also has removed the option to only return secondary attributes

  • The warehouse_id parameter is now optional in the following methods (but is still needed if no warehouse_id is referenced in the project yet):

    • DataModel.add_query_dataset
    • DataModel.writeback
    • DataModel.writeback_spark
    • DataModel.writeback_spark_to_spark
    • DataModel.write_feature_importance
    • DataModel.join_table
  • The order_by parameter takes an ordered list of features and their explicit sort order, it was added to the following functions

    • DataModel.get_data
    • DataModel.get_data_direct
    • DataModel.get_data_jdbc
    • DataModel.get_data_spark
    • DataModel.get_data_spark_from_spark
    • DataModel.get_database_query

project.py::Project

Updated Functions:

  • select_data_model: Now takes additional parameters data_model_id and name_contains so a user can more quickly find a data model to work with
  • join_udf: The warehouse_id parameter is now optional but still needed if no warehouse id is referenced in the project (i.e. if this is a new project)

snowflake.py::Snowflake

Updated Functions:

  • _init_: Now takes an additional optional parameter private_key to use key pair authentication instead of using a password