2.3.0 release notes
This release of AI-Link introduces new optimized operations for programmatic interaction with AtScale and expands existing beta functionality:
- Auto Semantic Model Creation (Beta) ability to create generic time dimensions
- Semantic Inference (Beta) for Databricks and testing to support strategic EDWs (Snowflake, GBQ, Databricks)
Please refer to our API documentation for the latest syntax to use with AI-Link. See below for updates associated with this release.
Auto Semantic Model Creation (Beta) Update
Date/Time Dimension Creation Support
Generate a standard dimension for years down to days that can a shareable object across data models in AtScale. Auto Semantic Model Creation will interpret the selected dataset and automatically create a default table for dates from a materialized date/time table from 1900-2100 with the columns representing the date granularity (year, month, day) levels that will generate a reusable hierarchy in AtScale. Users will have an option to not write this to their db in case they do not have write permissions. This will help users establish foundational, standardized date dimensions with improved query performance and roleplaying for subsequent data modelling activities.
Semantic Inference (Beta) Updates
Databricks UDF Support: Semantic Inference with dbSQL* and Databricks SQL UDFs; users can now package UDFs with data transformation queries directly into AtScale for on-demand execution from connected BI tools. Users will no longer need to remove data from Databricks to support large scale data analysis and transformation of data models.
* Note: Python version of this is still in private preview on Databricks, this capability will be updated to work with those UDFs for ML model embed.
New Python Helper Functions for Programmatic Interaction
- Bulk Operations: reduce the number of endpoint checks to improve the performance of accessing or updating large quantities of data. This makes creating aggregate features, calculated features, and calculated columns much easier for python developers - data engineers and data scientists - to build key components of AtScale models for their use-cases based on large tabular datasets. This will help accelerate query speed and strengthens functions for big data semantic/ML modeling.
get_database_query
: creates and exposes a view of an optimized SQL query from AtScale translated from our python function get_data_direct for data analysis and data engineering use-cases. User can then execute this in their target environment of choice. This helps data scientists and data engineers create an optimized SQL query that then can be tailored for various data transformation activities. (Additional details in changelog below)- Perspective creation and deletion: allows user to better control their get_data calls to target specific elements of the data model that are of interest. This maximizes the potential for existing data models to purpose different analytics initiatives.
Non-Functional Updates
- Support for Snowpark Dataframes so AtScale can work with our customers' databases and strategic tools (e.g. Snowpark)
- UX Quality of Life Improvements: informed function execution to simplify pythonic dialog for users
- Checks/Validations
- Documentation Site re-format and README file updates based on the latest code manifest; makes it easier to find key information to get started quickly.
Changelog for Syntax Updates
client.py::Client
Updated Functions:
auto_gen_semantic_model
: Added parametergenerate_date_table
to function to create a materialized date table in the source db, must be set to False to use previous behavior
connection.py::Connection
Updated Functions:
_init_
: Theserver
parameter now removes trailing '/' to avoid an error from a copied Chrome url. Connection requests are now made through a persisted session in order to save time with network communication
date_model.py::DataModel
New Functions:
get_database_query
: Returns a database query generated using the AtScale model to get the given featurecreate_perspective
: Creates a perspective that hides the inputs, using the current data model as the basedelete
: Deletes the current data model from the projectbulk_operator
: Performs a specified operation for all provided parameters. Optimizes validation and API calls for better performance on large numbers of operations. Must be chain operations of the same underlying function, for examplecreate_aggregate_feature
.
Renamed:
-
add_queried_dataset
will now beadd_query_dataset
to be more accurate to the function operation. This now takes an additional optional parametercreate_hinted_aggregate
to generate an aggregate table with all measures and join keys in the QDS dataset to improve query performance -
join_table
will now beadd_table
to be more accurate to function operation -
dataset_name
parameter has been renamed tofact_dataset_name
in the following functionsDataModel.create_denormalized_categorical_feature
DataModel.create_aggregate_feature
Optional New Parameters:
-
get_features
: Access features from the draft project by setting theuse_published
parameter to 'False'. The dict returned byget_features
now takes anatscale_type
parameter which maps to the values returned bydata_type
,data_type
now returns the python data type -
join_table
: Thejoin_features
parameter is now optional, for instances in which a user would like to add a table to a data model with no joins -
get_hierarchies
: The default value for parametersecondary_attribute
has changed so the default behavior does not include secondary attributes in the response. This also has removed the option to only return secondary attributes -
The
warehouse_id
parameter is now optional in the following methods (but is still needed if nowarehouse_id
is referenced in the project yet):DataModel.add_query_dataset
DataModel.writeback
DataModel.writeback_spark
DataModel.writeback_spark_to_spark
DataModel.write_feature_importance
DataModel.join_table
-
The
order_by
parameter takes an ordered list of features and their explicit sort order, it was added to the following functionsDataModel.get_data
DataModel.get_data_direct
DataModel.get_data_jdbc
DataModel.get_data_spark
DataModel.get_data_spark_from_spark
DataModel.get_database_query
project.py::Project
Updated Functions:
select_data_model
: Now takes additional parametersdata_model_id
andname_contains
so a user can more quickly find a data model to work withjoin_udf
: Thewarehouse_id
parameter is now optional but still needed if no warehouse id is referenced in the project (i.e. if this is a new project)
snowflake.py::Snowflake
Updated Functions:
_init_
: Now takes an additional optional parameterprivate_key
to use key pair authentication instead of using a password