data_model
class atscale.data_model.data_model.DataModel
Creates an object corresponding to an AtScale Data Model. Takes an existing model id and AtScale Project object to construct an object that deals with functionality related to model datasets and columns, as well as reading data and writing back predictions.
add_column_mapping
Adds a new mapping to an existing column mapping
- Parameters:
- dataset_name (str) – The dataset the mapping belongs to.
- column_name (str) – The column the mapping belongs to.
- mapped_name (enums.MappedColumnDataTypes) – The name for the new mapped column.
- data_type (str) – The data type of the new mapped column.
- publish (bool , optional) – _description_. Defaults to True.
add_level_to_hierarchy
Adds a new level to the given hierarchy
- Parameters:
- new_level_name (str) – The name of the level in the new hierarchy.
- hierarchy_name (str) – The name of the hierarchy to create in the dimension.
- dataset_name (str) – The name of the dataset to use.
- value_column (str) – The value column in the dataset to use for the level.
- existing_level (str) – The existing level to insert the new one at.
- key_columns (List *[*str ] , optional) – The key columns in the dataset to use for the level. Defaults to None to use the value column.
- add_above_existing (bool , optional) – Whether the new level should be inserted above the existing one. Defaults to True.
- level_type (enums.TimeSteps , optional) – The enums.TimeSteps for the level if time based. Defaults to enums.TimeSteps.Regular.
- caption (str , optional) – The caption for the level. Defaults to None to use new_level_name.
- description (str , optional) – The description for the level. Defaults to “”.
- publish (bool , optional) – Whether the updated project should be published. Defaults to True.
add_table
Add a table in the data warehouse to the data model
- Parameters:
- table_name (str) – The table to join
- database (str) – The database the table belongs to if relevant for the data warehouse.
- schema (str) – The schema the table belongs to if relevant for the data warehouse.
- join_features (List *[*str ]) – The feature query names in the data model to join on. Defaults to None to create no joins.
- join_columns (list , optional) – The column names in the dataframe to join to the join_features. List must be either None or the same length and order as join_features. Defaults to None to use identical names to the join_features. If multiple columns are needed for a single join they should be in a nested list
- roleplay_features (List *[*str ] , optional) – The roleplays to use on the relationships. List must be either None or the same length and order as join_features. Use ‘’ to not roleplay that relationship. Defaults to None.
- warehouse_id (str , optional) – The id of the warehouse at which the data model and this dataset point.
- allow_aggregates (bool , optional) – Whether to allow aggregates to be built off of the dataset. Defaults to True.
- publish (bool , optional) – Whether or not the updated project should be published. Defaults to True.
bulk_operator
Performs a specified operation for all parameters in the list. Optimizes validation and api calls for better performance on large numbers of operations. Must be chain operations of the same underlying function, for example create_aggregate_feature.
- Parameters:
- function (Callable) – The function to call a number of times.
- parameter_list (List *[*Dict ]) – The list of parameters for each function call.
- error_limit (int , optional) – Defaults to 5, the maximum number of similar errors to collect before abbreviating.
- return_error_dict (bool) – If the function should return a dictionary of dictionaries when failures are found. Defaults to False to raise the error list at the time the error is found.
- continue_on_errors (bool) – If the function should commit changes for all inputs without errors. Defaults to False to not push any changes in the event of an error.
- publish (bool , optional) – Defaults to True, whether the updated project should be published
clone
Clones the current DataModel in its project and sets the name to the given query_name.
- Parameters:
- query_name (str) – The name for the newly cloned model.
- description (str , optional) – The description of the model. Defaults to “”.
- publish (bool , optional) – Whether to publish the project after creating the model. Defaults to True.
- Returns: The DataModel object for the new model.
- Return type: DataModel
column_exists
Checks if the given column name exists in the dataset.
- Parameters:
- dataset_name (str) – the name of the dataset we pull the columns from, case-sensitive.
- column_name (str) – the name of the column to check, case-sensitive
- Returns: true if name found, else false.
- Return type: bool
create_aggregate_feature
Creates a new aggregate feature.
- Parameters:
- fact_dataset_name (str) – The fact dataset containing the column that the feature will use.
- column_name (str) – The column that the feature will use.
- new_feature_name (str) – The query name of the new feature.
- aggregation_type (atscale.utils.enums.enums.Aggs) – What aggregation method to use for the feature. Example: enums.Aggs.MAX Valid options can be found in utils.enums.Aggs
- description (str) – The description for the feature. Defaults to None.
- caption (str) – The caption for the feature. Defaults to None.
- folder (str) – The folder to put the feature in. Defaults to None.
- format_string (Union [enums.FeatureFormattingType , str ]) – The format string for the feature. Defaults to None.
- visible (bool , optional) – Whether the feature will be visible to BI tools. Defaults to True.
- publish (bool) – Whether the updated project should be published. Defaults to True.
create_calculated_column
Creates a new calculated column. A calculated column is a column whose value is calculated by a SQL expression (referencing one or more columns from the dataset) run at query time for each row. See AtScale documentation for more info on calculated columns.
- Parameters:
- dataset_name (str) – The dataset the calculated column will be derived in.
- column_name (str) – The name of the column.
- expression (str) – The SQL expression for the column.
- publish (bool) – Whether the updated project should be published. Defaults to True.
create_calculated_feature
Creates a new calculated feature given a query name and an MDX Expression.
- Parameters:
- new_feature_name (str) – The query name of the new feature.
- expression (str) – The MDX expression for the feature.
- description (str) – The description for the feature. Defaults to None.
- caption (str) – The caption for the feature. Defaults to None to use the new_feature_name.
- folder (str) – The folder to put the feature in. Defaults to None.
- format_string (Union [enums.FeatureFormattingType , str ]) – The format string for the feature. Defaults to None.
- visible (bool) – Whether the feature will be visible to BI tools. Defaults to True.
- publish (bool) – Whether the updated project should be published. Defaults to True.
create_dataset
Creates a dataset in the data model.
- Parameters:
- dataset_name (str) – The display and query name of the dataset to edit
- table_name (str , optional) – The name of the table to use. Defaults to None to use a query.
- database (str , optional) – The database of the table to use. Defaults to None to use a query.
- schema (str , optional) – The schema of the table to use. Defaults to None to use a query.
- query (str , optional) – The sql query if creating a query dataset. Defaults to None to use a table.
- warehouse_id (str , optional) – The warehouse to associate the dataset with. Defaults to None to infer from the model.
- allow_aggregates (bool , optional) – The new setting for if aggregates are allowed to be built off of this dataset. Defaults to True.
- create_hinted_aggregate (bool , optional) – The setting for if an aggregate table is generated for all measures and keys if this is a QDS. Defaults to False.
- dimension_only (bool , optional) – Whether the dataset is only going to be used for dimensions. Defaults to False.
- incremental_indicator (string , optional) – The indicator column for incremental builds. Defaults to None to not enable incremental builds.
- grace_period (int , optional) – The grace period for incremental builds. Defaults to 0.
- safe_to_join_to_incremental (bool , optional) – Whether it is safe to join from this dataset to one with incremental builds enabled. Defaults to False.
- publish (bool , optional) – Whether or not the updated project should be published. Defaults to True.
create_dataset_relationship
Creates a relationship between a dataset and features in the model
- Parameters:
- dataset_name (str) – The dataset to join
- join_features (List *[*str ]) – The feature query names in the data model to join on
- join_columns (list , optional) – The column names in the dataset to join to the join_features. List must be either None or the same length and order as join_features. Defaults to None to use identical names to the join_features. If multiple columns are needed for a single join they should be in a nested list
- roleplay_features (List *[*str ] , optional) – The roleplays to use on the relationships. List must be either None or the same length and order as join_features. Use ‘’ to not roleplay that relationship. Defaults to None.
- publish (bool , optional) – Whether or not the updated project should be published. Defaults to True.
create_denormalized_categorical_feature
Creates a new denormalized categorical feature.
- Parameters:
- fact_dataset_name (str) – The name of the fact dataset to find the column_name.
- column_name (str) – The column that the feature will use.
- new_feature_name (str) – The query name of the new feature.
- description (str , optional) – The description for the feature. Defaults to None.
- caption (str , optional) – The caption for the feature. Defaults to None.
- folder (str , optional) – The folder to put the feature in. Defaults to None.
- visible (bool , optional) – Whether the feature will be visible to BI tools. Defaults to True.
- publish (bool , optional) – Whether the updated project should be published. Defaults to True.
create_dimension
Creates a dimension with one hierarchy and level
- Parameters:
- new_dimension_name (str) – The name of the new dimension to create.
- new_hierarchy_name (str) – The name of the hierarchy to create in the dimension.
- new_level_name (str) – The name of the level in the new hierarchy.
- dataset_name (str) – The name of the dataset to use.
- level_value_column (str) – The value column in the dataset to use for the level.
- level_key_columns (List *[*str ] , optional) – The key columns in the dataset to use for the level. Defaults to None to use the value column.
- time_dimension (bool , optional) – Whether to flag the dimension as a time dimension. Defaults to False.
- dimension_description (str , optional) – The description for the dimension. Defaults to “”.
- hierarchy_caption (str , optional) – The caption for the heirarchy Defaults to None to use new_hierarchy_name.
- hierarchy_description (str , optional) – The description for the hierarchy. Defaults to “”.
- hierarchy_folder (str , optional) – The folder for the hierarchy. Defaults to “”.
- level_type (enums.TimeSteps , optional) – The enums.TimeSteps for the level if time based. Defaults to enums.TimeSteps.Regular.
- level_caption (str , optional) – The caption for the level. Defaults to None to use new_level_name.
- level_description (str , optional) – The description for the level. Defaults to “”.
- publish (bool , optional) – Whether the updated project should be published. Defaults to True.
create_filter_attribute
Creates a new boolean secondary attribute to filter on a given subset of the level’s values.
- Parameters:
- new_feature_name (str) – The query name of the new attribute.
- hierarchy_name (str) – The query name of the hierarchy the level belongs to.
- level_name (str) – The query name of the level to apply the filter to.
- filter_values (List *[*str ]) – The list of values to filter on.
- caption (str) – The caption for the feature. Defaults to None.
- description (str) – The description for the feature. Defaults to None.
- folder (str) – The folder to put the feature in. Defaults to None.
- visible (bool) – Whether the created attribute will be visible to BI tools. Defaults to True.
- publish (bool) – Whether or not the updated project should be published. Defaults to True.
create_hierarchy
Creates a hierarchy with one level
- Parameters:
- new_hierarchy_name (str) – The name of the hierarchy to create in the dimension.
- dimension_name (str) – The dimension to add the hierarchy to.
- new_level_name (str) – The name of the level in the new hierarchy.
- dataset_name (str) – The name of the dataset to use.
- level_value_column (str) – The value column in the dataset to use for the level.
- level_key_columns (List *[*str ] , optional) – The key columns in the dataset to use for the level. Defaults to None to use the value column.
- hierarchy_caption (str , optional) – The caption for the heirarchy Defaults to None to use new_hierarchy_name.
- hierarchy_description (str , optional) – The description for the hierarchy. Defaults to “”.
- hierarchy_folder (str , optional) – The folder for the hierarchy. Defaults to “”.
- level_type (enums.TimeSteps , optional) – The enums.TimeSteps for the level if time based. Defaults to enums.TimeSteps.Regular.
- level_caption (str , optional) – The caption for the level. Defaults to None to use new_level_name.
- level_description (str , optional) – The description for the level. Defaults to “”.
- publish (bool , optional) – Whether the updated project should be published. Defaults to True.
create_lag_feature
Creates a lagged feature based on the numeric feature and time hierarchy passed in.
- Parameters:
- new_feature_name (str) – The query name of the feature to create.
- numeric_feature_name (str) – The query name of the numeric feature to lag.
- hierarchy_name (str) – The query name of the time hierarchy to use for lagging.
- level_name (str) – The query name of the hierarchy level to use for lagging.
- time_length (int) – The length of the lag.
- description (str , optional) – A description for the feature. Defaults to None.
- caption (str , optional) – A caption for the feature. Defaults to None.
- folder (str , optional) – The folder to put the feature in. Defaults to None.
- format_string (Union [enums.FeatureFormattingType , str ] , optional) – A format sting for the feature. Defaults to None.
- visible (bool , optional) – Whether the feature should be visible. Defaults to True.
- publish (bool , optional) – Whether to publish the project after creating the feature. Defaults to True.
create_mapped_columns
Creates a mapped column. Maps a column that is a key value structure into one or more new columns with the name of the given key(s). Types for the source keys and columns, and new columns are required. Valid types include ‘Int’, ‘Long’, ‘Boolean’, ‘String’, ‘Float’, ‘Double’, ‘Decimal’, ‘DateTime’, and ‘Date’.
- Parameters:
- dataset_name (str) – The dataset the mapped column will be derived in.
- column_name (str) – The name of the column to map.
- mapped_names (list str) – The names of the mapped columns.
- data_types (list enums.MappedColumnDataTypes) – The types of the mapped columns.
- key_terminator (enums.MappedColumnKeyTerminator) – The key terminator. Valid values are ‘:’, ‘=’, and ‘^’
- field_terminator (enums.MappedColumnFieldTerminator) – The field terminator. Valid values are ‘,’, ‘;’, and ‘|’
- map_key_type (enums.MappedColumnDataTypes) – The mapping key type for all the keys in the origin column.
- map_value_type (enums.MappedColumnDataTypes) – The mapping value type for all values in the origin column.
- first_char_delimited (bool) – Whether the first character is delimited. Defaults to False.
- publish (bool) – Whether the updated project should be published. Defaults to True.
create_percentage_features
Creates a set of features calculating the percentage of the given numeric_feature’s value compared to each non-leaf (i.e. non-base) level in the hierarchy. Works off of the published project.
- Parameters:
- numeric_feature_name (str) – The query name of the numeric feature to use for the calculation
- hierarchy_name (str) – The query name of the hierarchy to use for comparisons
- level_names (List *[*str ] , optional) – The query names for the subset of levels to make percentages for, if None generates percentages for all non-leaf levels. Defaults to None.
- new_feature_names (List *[*str ] , optional) – The query names of the new columns, if None generates names. If not None it must be same length and order as level_names. Defaults to None.
- description (str , optional) – The description for the feature. Defaults to None.
- caption (str , optional) – The caption for the new features. Defaults to None.
- folder (str , optional) – The folder to put the new features in. Defaults to None.
- format_string (Union [enums.FeatureFormattingType , str ] , optional) – The format string for the features. Defaults to None.
- visible (bool , optional) – Whether the feature will be visible to BI tools. Defaults to True.
- publish (bool , optional) – Whether or not the updated project should be published. Defaults to True.
create_period_to_date_features
Creates a period-to-date calculation off of the published project.
- Parameters:
- numeric_feature_name (str) – The query name of the numeric feature to use for the calculation
- hierarchy_name (str) – The query name of the time hierarchy used in the calculation
- level_names (List *[*str ] , optional) – The query names for the subset of levels to make period to date calcs for, if None generates period to date for all non-leaf levels. Defaults to None.
- new_feature_names (List *[*str ] , optional) – The query names of the new columns, if None generates names. If not None it must be same length and order as level_names. Defaults to None.
- description (str , optional) – The description for the feature. Defaults to None.
- folder (str , optional) – The folder to put the feature in. Defaults to None.
- format_string (Union [enums.FeatureFormattingType , str ] , optional) – The format string for the feature. Defaults to None.
- visible (bool , optional) – Whether the feature will be visible to BI tools. Defaults to True.
- publish (bool , optional) – Whether the updated project should be published. Defaults to True.
- Returns: A message containing the names of successfully created features
- Return type: str
create_perspective
Creates a perspective that hides the inputs, using the current data model as a base.
- Parameters:
- new_perspective_name (str) – Creates a new perspective based on the current data model. Objects passed in will be hidden by the perspective.
- dimensions (List *[*str ] , optional) – Dimensions to hide. Defaults to None.
- hierarchies (List *[*str ] , optional) – Query names of hierarchies to hide. Defaults to None.
- categorical_features (List *[*str ] , optional) – Query names of categorical features to hide. Defaults to None.
- numeric_features (List *[*str ] , optional) – Query names of numeric features to hide. Defaults to None.
- publish (bool , optional) – Whether to publish the updated project. Defaults to True.
- Returns: The DataModel object for the created perspective.
- Return type: DataModel
create_rolling_features
Creates a rolling calculated numeric feature for the given column. If no list of enums.MDXAggs is provided, rolling calc features : will be made for Sum, Mean, Min, Max, and Stdev
- Parameters:
- new_feature_name (str) – The query name for the new feature, will be suffixed with the agg type if multiple are being created.
- numeric_feature_name (str) – The query name of the numeric feature to use for the calculation
- hierarchy_name (str) – The query name of the time hierarchy used in the calculation
- level_name (str) – The query name of the level within the time hierarchy
- time_length (int) – The length of time the feature should be calculated over
- aggregation_types (List [enums.MDXAggs ] , optional) – The type of aggregation to do for the rolling calc. If none, all agg types are used.
- description (str , optional) – The description for the feature. Defaults to None.
- caption (str , optional) – The caption for the feature. Defaults to None.
- folder (str , optional) – The folder to put the feature in. Defaults to None.
- format_string (Union [enums.FeatureFormattingType , str ] , optional) – The format string for the feature. Defaults to None.
- visible (bool , optional) – Whether the feature will be visible to BI tools. Defaults to True.
- publish (bool , optional) – Whether or not the updated project should be published. Defaults to True.)
- Returns: A list containing the names of the newly-created features.
- Return type: List[str]
create_secondary_attribute
Creates a new secondary attribute on an existing hierarchy and level.
- Parameters:
- dataset_name (str) – The dataset containing the column that the secondary attribute will use.
- new_feature_name (str) – What the attribute will be called.
- column_name (str) – The column that the seconday attribute will use.
- hierarchy_name (str) – The query name of the hierarchy to add the attribute to.
- level_name (str) – The query name of the level of the hierarchy to add the attribute to.
- description (str , optional) – The description for the attribute. Defaults to None.
- caption (str , optional) – The caption for the attribute. Defaults to None.
- folder (str , optional) – The folder for the attribute. Defaults to None.
- visible (bool , optional) – Whether or not the secondary attribute will be visible to BI tools. Defaults to True.
- publish (bool , optional) – Whether or not the updated project should be published. Defaults to True.
create_time_differencing_feature
Creates a time over time subtraction calculation. For example, create_time_differencing on the feature ‘revenue’ , time level ‘date’, and a length of 2 will create a feature calculating the revenue today subtracted by the revenue two days ago
- Parameters:
- new_feature_name (str) – The query name of the feature to create.
- numeric_feature_name (str) – The query name of the numeric feature to use for the calculation.
- hierarchy_name (str) – The query name of the time hierarchy used in the calculation.
- level_name (str) – The query name of the level within the time hierarchy
- time_length (int) – The length of the lag in units of the given level of the given hierarchy.
- description (str) – The description for the feature. Defaults to None.
- caption (str) – The caption for the feature. Defaults to None.
- folder (str) – The folder to put the feature in. Defaults to None.
- format_string (Union [enums.FeatureFormattingType , str ]) – The format string for the feature. Defaults to None.
- visible (bool , optional) – Whether the feature should be visible. Defaults to True.
- publish (bool) – Whether the updated project should be published. Defaults to True.
create_user_defined_aggregate
Creates a user defined aggregate containing the given categorical and numeric features. Calculated features cannot be added.
- Parameters:
- name (str) – The name of the aggregate.
- categorical_features (List *[*str ] , optional) – Categorical features to add. Defaults to None.
- numeric_features (List *[*str ] , optional) – Numeric features to add. Defaults to None.
- publish (bool , optional) – Whether to publish the updated project. Defaults to True.
- Returns: The id of the created aggregate
- Return type: str
property cube_id : str
Getter for the id of the source cube. If the DataModel is a perspective this will : return the reference id for the source cube.
- Returns: The id of the source cube.
- Return type: str
dataset_exists
Returns whether a given dataset_name exists in the data model, case-sensitive.
- Parameters:
- dataset_name (str) – the name of the dataset to try and find
- include_unused (bool , optional) – Also return the names of datasets in the project library, even if they are not used in the model. Defaults to False to only list datasets used in the model.
- Returns: true if name found, else false.
- Return type: bool
delete
Deletes the current data model from the project
- Parameters: publish (bool , optional) – Whether to publish the updated project. Defaults to True.
delete_measures
Deletes a list of measures from the DataModel. If a measure is referenced in any calculated measures, : and delete_children is not set, then the user will be prompted with a list of children measures and given the choice to delete them or abort.
- Parameters:
- measure_list (List *[*str ]) – the query names of the measures to be deleted
- publish (bool , optional) – Defaults to True, whether the updated project should be published
- delete_children (bool , optional) – Defaults to None, if set to True or False no prompt will be given in the case of any other measures being derived from the given measure_name. Instead, these measures will also be deleted when delete_children is True, alternatively, if False, the method will be aborted with no changes to the data model
get_all_categorical_feature_names
Returns a list of all published categorical features (ie Hierarchy levels and secondary_attributes) in the given DataModel.
- Parameters: folder (str , optional) – The name of a folder in the DataModel containing features to exclusively list. Defaults to None to not filter by folder.
- Returns: A list of the query names of categorical features in the DataModel and, if given, in the folder.
- Return type: List[str]
get_all_numeric_feature_names
Returns a list of all published numeric features (ie Aggregate and Calculated Measures) in the data model.
- Parameters: folder (str , optional) – The name of a folder in the data model containing measures to exclusively list. Defaults to None to not filter by folder.
- Returns: A list of the query names of numeric features in the data model and, if given, in the folder.
- Return type: List[str]
get_columns
Gets all currently visible columns in a given dataset, case-sensitive.
- Parameters: dataset_name (str) – the name of the dataset to get columns from, case-sensitive.
- Returns: the columns in the given dataset
- Return type: Dict
get_connected_warehouse
Returns the warehouse id utilized in this data_model
- Returns: the warehouse id
- Return type: str
get_data
Submits a query against the data model using the supplied information and returns the results in a pandas DataFrame. Be sure that values passed to filters match the data type of the feature being filtered.
- Parameters:
- feature_list (List *[*str ]) – The list of feature query names to query.
- filter_equals (Dict *[*str , Any ] , optional) – Filters results based on the feature equaling the value. Defaults to None.
- filter_greater (Dict *[*str , Any ] , optional) – Filters results based on the feature being greater than the value. Defaults to None.
- filter_less (Dict *[*str , Any ] , optional) – Filters results based on the feature being less than the value. Defaults to None.
- filter_greater_or_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature being greater or equaling the value. Defaults to None.
- filter_less_or_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature being less or equaling the value. Defaults to None.
- filter_not_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature not equaling the value. Defaults to None.
- filter_in (Dict *[*str , list ] , optional) – Filters results based on the feature being contained in the values. Defaults to None.
- filter_not_in (Dict *[*str , list ] , optional) – Filters results based on the feature not being contained in the values. Defaults to None.
- filter_between (Dict *[*str , tuple ] , optional) – Filters results based on the feature being between the values. Defaults to None.
- filter_like (Dict *[*str , str ] , optional) – Filters results based on the feature being like the clause. Defaults to None.
- filter_not_like (Dict *[*str , str ] , optional) – Filters results based on the feature not being like the clause. Defaults to None.
- filter_rlike (Dict *[*str , str ] , optional) – Filters results based on the feature being matched by the regular expression. Defaults to None.
- filter_null (List *[*str ] , optional) – Filters results to show null values of the specified features. Defaults to None.
- filter_not_null (List *[*str ] , optional) – Filters results to exclude null values of the specified features. Defaults to None.
- order_by (List *[*Tuple *[*str , str ] ]) – The sort order for the returned dataframe. Accepts a list of tuples of the feature query name and ordering respectively: [(‘feature_name_1’, ‘DESC’), (‘feature_2’, ‘ASC’) …]. Defaults to None for AtScale Engine default sorting.
- group_by (List *[*str ]) – The groupby order for the query. Accepts a list of feature query names. Defaults to None to group in the order of the categorical features.
- limit (int , optional) – Limit the number of results. Defaults to None for no limit.
- comment (str , optional) – A comment string to build into the query. Defaults to None for no comment.
- use_aggs (bool , optional) – Whether to allow the query to use aggs. Defaults to True.
- gen_aggs (bool , optional) – Whether to allow the query to generate aggs. Defaults to True.
- fake_results (bool , optional) – Whether to use fake results, often used to train aggregates with queries that will frequently be used. Defaults to False.
- use_local_cache (bool , optional) – Whether to allow the query to use the local cache. Defaults to True.
- use_aggregate_cache (bool , optional) – Whether to allow the query to use the aggregate cache. Defaults to True.
- timeout (int , optional) – The number of minutes to wait for a response before timing out. Defaults to 10.
- raise_multikey_warning (bool , optional) – Whether to warn if a query contains attributes that have multiple key columns. Defaults to True.
- use_postgres (bool , optional) – Whether to use Postgres dialect for inbound query. Will only work if the current organization is configured to use Postgres inbound queries. Defaults to False.
- Returns: A pandas DataFrame containing the query results.
- Return type: DataFrame
get_data_direct
Generates an AtScale query against the data model to get the given features, translates it to a database query, and submits it directly to the database using the SQLConnection. The results are returned as a Pandas DataFrame. Be sure that values passed to filters match the data type of the feature being filtered.
- Parameters:
- dbconn (SQLConnection) – The connection to use to submit the query to the database.
- feature_list (List *[*str ]) – The list of feature query names to query.
- filter_equals (Dict *[*str , Any ] , optional) – A dictionary of features to filter for equality to the value. Defaults to None.
- filter_greater (Dict *[*str , Any ] , optional) – A dictionary of features to filter greater than the value. Defaults to None.
- filter_less (Dict *[*str , Any ] , optional) – A dictionary of features to filter less than the value. Defaults to None.
- filter_greater_or_equal (Dict *[*str , Any ] , optional) – A dictionary of features to filter greater than or equal to the value. Defaults to None.
- filter_less_or_equal (Dict *[*str , Any ] , optional) – A dictionary of features to filter less than or equal to the value. Defaults to None.
- filter_not_equal (Dict *[*str , Any ] , optional) – A dictionary of features to filter not equal to the value. Defaults to None.
- filter_in (Dict *[*str , list ] , optional) – A dictionary of features to filter in a list. Defaults to None.
- filter_not_in (Dict *[*str , list ] , optional) – Filters results based on the feature not being contained in the values. Defaults to None.
- filter_between (Dict *[*str , tuple ] , optional) – A dictionary of features to filter between the tuple values. Defaults to None.
- filter_like (Dict *[*str , str ] , optional) – A dictionary of features to filter like the value. Defaults to None.
- filter_not_like (Dict *[*str , str ] , optional) – Filters results based on the feature not being like the clause. Defaults to None.
- filter_rlike (Dict *[*str , str ] , optional) – A dictionary of features to filter rlike the value. Defaults to None.
- filter_null (List *[*str ] , optional) – A list of features to filter for null. Defaults to None.
- filter_not_null (List *[*str ] , optional) – A list of features to filter for not null. Defaults to None.
- order_by (List *[*Tuple *[*str , str ] ]) – The sort order for the returned dataframe. Accepts a list of tuples of the feature query name and ordering respectively: [(‘feature_name_1’, ‘DESC’), (‘feature_2’, ‘ASC’) …]. Defaults to None for AtScale Engine default sorting.
- group_by (List *[*str ]) – The groupby order for the query. Accepts a list of feature query names. Defaults to None to group in the order of the categorical features.
- limit (int , optional) – A limit to put on the query. Defaults to None.
- comment (str , optional) – A comment to put in the query. Defaults to None.
- use_aggs (bool , optional) – Whether to allow the query to use aggs. Defaults to True.
- gen_aggs (bool , optional) – Whether to allow the query to generate aggs. Defaults to True.
- raise_multikey_warning (str , optional) – Whether to warn if a query contains attributes that have multiple key columns. Defaults to True.
- Returns: The results of the query as a DataFrame
- Return type: DataFrame
get_data_jdbc
Establishes a jdbc connection to AtScale with the supplied information. Then submits query against the published project and returns the results in a pandas DataFrame. Be sure that values passed to filters match the data type of the feature being filtered.
- Parameters:
- feature_list (List *[*str ]) – The list of feature query names to query.
- filter_equals (Dict *[*str , Any ] , optional) – Filters results based on the feature equaling the value. Defaults to None.
- filter_greater (Dict *[*str , Any ] , optional) – Filters results based on the feature being greater than the value. Defaults to None.
- filter_less (Dict *[*str , Any ] , optional) – Filters results based on the feature being less than the value. Defaults to None.
- filter_greater_or_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature being greater or equaling the value. Defaults to None.
- filter_less_or_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature being less or equaling the value. Defaults to None.
- filter_not_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature not equaling the value. Defaults to None.
- filter_in (Dict *[*str , list ] , optional) – Filters results based on the feature being contained in the values. Defaults to None.
- filter_not_in (Dict *[*str , list ] , optional) – Filters results based on the feature not being contained in the values. Defaults to None.
- filter_between (Dict *[*str , tuple ] , optional) – Filters results based on the feature being between the values. Defaults to None.
- filter_like (Dict *[*str , str ] , optional) – Filters results based on the feature being like the clause. Defaults to None.
- filter_not_like (Dict *[*str , str ] , optional) – Filters results based on the feature not being like the clause. Defaults to None.
- filter_rlike (Dict *[*str , str ] , optional) – Filters results based on the feature being matched by the regular expression. Defaults to None.
- filter_null (List *[*str ] , optional) – Filters results to show null values of the specified features. Defaults to None.
- filter_not_null (List *[*str ] , optional) – Filters results to exclude null values of the specified features. Defaults to None.
- order_by (List *[*Tuple *[*str , str ] ]) – The sort order for the returned dataframe. Accepts a list of tuples of the feature query name and ordering respectively: [(‘feature_name_1’, ‘DESC’), (‘feature_2’, ‘ASC’) …]. Defaults to None for AtScale Engine default sorting.
- group_by (List *[*str ]) – The groupby order for the query. Accepts a list of feature query names. Defaults to None to group in the order of the categorical features.
- limit (int , optional) – Limit the number of results. Defaults to None for no limit.
- comment (str , optional) – A comment string to build into the query. Defaults to None for no comment.
- use_aggs (bool , optional) – Whether to allow the query to use aggs. Defaults to True.
- gen_aggs (bool , optional) – Whether to allow the query to generate aggs. Defaults to True.
- raise_multikey_warning (str , optional) – Whether to warn if a query contains attributes that have multiple key columns. Defaults to True.
- Returns: A pandas DataFrame containing the query results.
- Return type: DataFrame
get_data_spark
Uses the provided spark_session to execute a query generated by the AtScale query engine against the data model. Returns the results in a spark DataFrame. Be sure that values passed to filters match the data type of the feature being filtered.
- Parameters:
- feature_list (List *[*str ]) – The list of feature query names to query.
- spark_session (pyspark.sql.SparkSession) – The pyspark SparkSession to execute the query with
- filter_equals (Dict *[*str , Any ] , optional) – Filters results based on the feature equaling the value. Defaults to None.
- filter_greater (Dict *[*str , Any ] , optional) – Filters results based on the feature being greater than the value. Defaults to None.
- filter_less (Dict *[*str , Any ] , optional) – Filters results based on the feature being less than the value. Defaults to None.
- filter_greater_or_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature being greater or equaling the value. Defaults to None.
- filter_less_or_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature being less or equaling the value. Defaults to None.
- filter_not_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature not equaling the value. Defaults to None.
- filter_in (Dict *[*str , list ] , optional) – Filters results based on the feature being contained in the values. Defaults to None.
- filter_not_in (Dict *[*str , list ] , optional) – Filters results based on the feature not being contained in the values. Defaults to None.
- filter_between (Dict *[*str , tuple ] , optional) – Filters results based on the feature being between the values. Defaults to None.
- filter_like (Dict *[*str , str ] , optional) – Filters results based on the feature being like the clause. Defaults to None.
- filter_not_like (Dict *[*str , str ] , optional) – Filters results based on the feature not being like the clause. Defaults to None.
- filter_rlike (Dict *[*str , str ] , optional) – Filters results based on the feature being matched by the regular expression. Defaults to None.
- filter_null (List *[*str ] , optional) – Filters results to show null values of the specified features. Defaults to None.
- filter_not_null (List *[*str ] , optional) – Filters results to exclude null values of the specified features. Defaults to None.
- order_by (List *[*Tuple *[*str , str ] ]) – The sort order for the returned dataframe. Accepts a list of tuples of the feature query name and ordering respectively: [(‘feature_name_1’, ‘DESC’), (‘feature_2’, ‘ASC’) …]. Defaults to None for AtScale Engine default sorting.
- group_by (List *[*str ]) – The groupby order for the query. Accepts a list of feature query names. Defaults to None to group in the order of the categorical features.
- limit (int , optional) – Limit the number of results. Defaults to None for no limit.
- comment (str , optional) – A comment string to build into the query. Defaults to None for no comment.
- use_aggs (bool , optional) – Whether to allow the query to use aggs. Defaults to True.
- gen_aggs (bool , optional) – Whether to allow the query to generate aggs. Defaults to True.
- raise_multikey_warning (str , optional) – Whether to warn if a query contains attributes that have multiple key columns. Defaults to True.
- Returns: A pyspark DataFrame containing the query results.
- Return type: pyspark.sql.dataframe.DataFrame
get_data_spark_jdbc
Uses the provided information to establish a jdbc connection to the underlying data warehouse. Generates a query against the data model and uses the provided spark_session to execute. Returns the results in a spark DataFrame. Be sure that values passed to filters match the data type of the feature being filtered.
- Parameters:
- feature_list (List *[*str ]) – The list of feature query names to query.
- spark_session (pyspark.sql.SparkSession) – The pyspark SparkSession to execute the query with
- jdbc_format (str) – the driver class name. For example: ‘jdbc’, ‘net.snowflake.spark.snowflake’, ‘com.databricks.spark.redshift’
- jdbc_options (Dict *[*str *,*str ]) – Case-insensitive to specify connection options for jdbc
- filter_equals (Dict *[*str , Any ] , optional) – Filters results based on the feature equaling the value. Defaults to None.
- filter_greater (Dict *[*str , Any ] , optional) – Filters results based on the feature being greater than the value. Defaults to None.
- filter_less (Dict *[*str , Any ] , optional) – Filters results based on the feature being less than the value. Defaults to None.
- filter_greater_or_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature being greater or equaling the value. Defaults to None.
- filter_less_or_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature being less or equaling the value. Defaults to None.
- filter_not_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature not equaling the value. Defaults to None.
- filter_in (Dict *[*str , list ] , optional) – Filters results based on the feature being contained in the values. Defaults to None.
- filter_not_in (Dict *[*str , list ] , optional) – Filters results based on the feature not being contained in the values. Defaults to None.
- filter_between (Dict *[*str , tuple ] , optional) – Filters results based on the feature being between the values. Defaults to None.
- filter_like (Dict *[*str , str ] , optional) – Filters results based on the feature being like the clause. Defaults to None.
- filter_not_like (Dict *[*str , str ] , optional) – Filters results based on the feature not being like the clause. Defaults to None.
- filter_rlike (Dict *[*str , str ] , optional) – Filters results based on the feature being matched by the regular expression. Defaults to None.
- filter_null (List *[*str ] , optional) – Filters results to show null values of the specified features. Defaults to None.
- filter_not_null (List *[*str ] , optional) – Filters results to exclude null values of the specified features. Defaults to None.
- order_by (List *[*Tuple *[*str , str ] ]) – The sort order for the returned dataframe. Accepts a list of tuples of the feature query name and ordering respectively: [(‘feature_name_1’, ‘DESC’), (‘feature_2’, ‘ASC’) …]. Defaults to None for AtScale Engine default sorting.
- group_by (List *[*str ]) – The groupby order for the query. Accepts a list of feature query names. Defaults to None to group in the order of the categorical features.
- limit (int , optional) – Limit the number of results. Defaults to None for no limit.
- comment (str , optional) – A comment string to build into the query. Defaults to None for no comment.
- use_aggs (bool , optional) – Whether to allow the query to use aggs. Defaults to True.
- gen_aggs (bool , optional) – Whether to allow the query to generate aggs. Defaults to True.
- raise_multikey_warning (str , optional) – Whether to warn if a query contains attributes that have multiple key columns. Defaults to True.
- Returns: A pyspark DataFrame containing the query results.
- Return type: pyspark.sql.dataframe.DataFrame
get_database_query
Returns a database query generated using the data model to get the given features. Be sure that values passed to filters match the data type of the feature being filtered.
- Parameters:
- feature_list (List *[*str ]) – The list of feature query names to query.
- filter_equals (Dict *[*str , Any ] , optional) – A dictionary of features to filter for equality to the value. Defaults to None.
- filter_greater (Dict *[*str , Any ] , optional) – A dictionary of features to filter greater than the value. Defaults to None.
- filter_less (Dict *[*str , Any ] , optional) – A dictionary of features to filter less than the value. Defaults to None.
- filter_greater_or_equal (Dict *[*str , Any ] , optional) – A dictionary of features to filter greater than or equal to the value. Defaults to None.
- filter_less_or_equal (Dict *[*str , Any ] , optional) – A dictionary of features to filter less than or equal to the value. Defaults to None.
- filter_not_equal (Dict *[*str , Any ] , optional) – A dictionary of features to filter not equal to the value. Defaults to None.
- filter_in (Dict *[*str , list ] , optional) – A dictionary of features to filter in a list. Defaults to None.
- filter_not_in (Dict *[*str , list ] , optional) – A dictionary of features to filter not in a list. Defaults to None.
- filter_between (Dict *[*str , tuple ] , optional) – A dictionary of features to filter between the tuple values. Defaults to None.
- filter_like (Dict *[*str , str ] , optional) – A dictionary of features to filter like the value. Defaults to None.
- filter_not_like (Dict *[*str , str ] , optional) – A dictionary of features to filter not like the value. Defaults to None.
- filter_rlike (Dict *[*str , str ] , optional) – A dictionary of features to filter rlike the value. Defaults to None.
- filter_null (List *[*str ] , optional) – A list of features to filter for null. Defaults to None.
- filter_not_null (List *[*str ] , optional) – A list of features to filter for not null. Defaults to None.
- order_by (List *[*Tuple *[*str , str ] ]) – The sort order for the returned query. Accepts a list of tuples of the feature query name and ordering respectively: [(‘feature_name_1’, ‘DESC’), (‘feature_2’, ‘ASC’) …]. Defaults to None for AtScale Engine default sorting.
- group_by (List *[*str ]) – The groupby order for the query. Accepts a list of feature query names. Defaults to None to group in the order of the categorical features.
- limit (int , optional) – A limit to put on the query. Defaults to None.
- comment (str , optional) – A comment to put in the query. Defaults to None.
- use_aggs (bool , optional) – Whether to allow the query to use aggs. Defaults to True.
- gen_aggs (bool , optional) – Whether to allow the query to generate aggs. Defaults to True.
- raise_multikey_warning (str , optional) – Whether to warn if a query contains attributes that have multiple key columns. Defaults to True.
- Returns: The generated database query
- Return type: str
get_dataset
Gets the metadata of a dataset.
- Parameters: dataset_name (str) – The name of the dataset to pull.
- Returns: A dictionary of the metadata for the dataset.
- Return type: Dict
get_dataset_names
Gets the name of all datasets currently utilized by the DataModel and returns as a list.
- Parameters: include_unused (bool , optional) – Also return the names of datasets in the project library, even if they are not used in the model. Defaults to False to only list datasets used in the model.
- Returns: list of dataset names
- Return type: List[str]
get_dimension_dataset_names
Gets the name of all dimension datasets currently utilized by the DataModel and returns as a list.
- Returns: list of dimension dataset names
- Return type: List[str]
get_dimensions
Gets a dictionary of dictionaries with the published dimension names and metadata.
- Parameters: use_published (bool , optional) – whether to get the dimensions of the published or draft data model. Defaults to True to use the published version.
- Returns: A dictionary of dictionaries where the dimension names are the keys in the outer dictionary : while the inner keys are the following: ‘description’, ‘type’(value is Time or Standard).
- Return type: Dict
get_fact_dataset_names
Gets the name of all fact datasets currently utilized by the DataModel and returns as a list.
- Returns: list of fact dataset names
- Return type: List[str]
get_feature_description
Returns the description of a given published feature.
- Parameters: feature (str) – The query name of the feature to retrieve the description of.
- Returns: The description of the given feature.
- Return type: str
get_feature_expression
Returns the expression of a given published feature.
- Parameters: feature (str) – The query name of the feature to return the expression of.
- Returns: The expression of the given feature.
- Return type: str
get_features
Gets the feature names and metadata for each feature in the published DataModel.
- Parameters:
- feature_list (List *[*str ] , optional) – A list of feature query names to return. Defaults to None to return all. All features in this list must exist in the model.
- folder_list (List *[*str ] , optional) – A list of folders to filter by. Defaults to None to ignore folder.
- feature_type (enums.FeatureType , optional) – The type of features to filter by. Options include enums.FeatureType.ALL, enums.FeatureType.CATEGORICAL, or enums.FeatureType.NUMERIC. Defaults to ALL.
- use_published (bool , optional) – whether to get the features of the published or draft data model. Defaults to True to use the published version.
- Returns: A dictionary of dictionaries where the feature names are the keys in the outer dictionary : while the inner keys are the following: ‘data_type’(value is a level-type, ‘Aggregate’, or ‘Calculated’), ‘description’, ‘expression’, caption, ‘folder’, and ‘feature_type’(value is Numeric or Categorical).
- Return type: Dict
get_folders
Returns a list of the available folders in the published DataModel.
- Returns: A list of the available folders
- Return type: List[str]
get_hierarchies
Gets a dictionary of dictionaries with the published hierarchy names and metadata. Secondary attributes are treated as : their own hierarchies, they are hidden by default, but can be shown with the secondary_attribute parameter.
- Parameters:
- secondary_attribute (bool , optional) – if we want to filter the secondary attribute field. True will return hierarchies and secondary_attributes, False will return only non-secondary attributes. Defaults to False.
- folder_list (List *[*str ] , optional) – The list of folders in the data model containing hierarchies to exclusively list. Defaults to None to not filter by folder.
- use_published (bool , optional) – whether to get the hierarchies of the published or draft data model. Defaults to True to use the published version.
- Returns: A dictionary of dictionaries where the hierarchy names are the keys in the outer dictionary : while the inner keys are the following: ‘dimension’, ‘description’, ‘caption’, ‘folder’, ‘type’(value is Time or Standard), ‘secondary_attribute’.
- Return type: Dict
get_hierarchy_levels
Gets a list of strings for the levels of a given published hierarchy
- Parameters: hierarchy_name (str) – The query name of the hierarchy
- Returns: A list containing the hierarchy’s levels
- Return type: List[str]
property id : str
Getter for the id instance variable
- Returns: The id of this model
- Return type: str
is_perspective
Checks if this DataModel is a perspective
- Returns: true if this is a perspective
- Return type: bool
list_related_datasets
Returns a list of all fact datasets with relationships to the given hierarchy.
- Parameters: hierarchy_name (str) – The query name of a hierarchy to find relationships from.
- Returns: A list of the names of the datasets that have relationships to the hierarchy.
- Return type: List[str]
list_related_hierarchies
Returns a list of all hierarchies with relationships to the given dataset.
- Parameters: dataset_name (str) – The name of a fact dataset to find relationships from.
- Returns: A list of the names of the hierarchies that have relationships to the dataset.
- Return type: List[str]
property name : str
Getter for the name instance variable. The name of the data model.
- Returns: The textual identifier for the data model.
- Return type: str
property project : Project
Getter for the Project instance variable.
- Returns: The Project object this model belongs to.
- Return type: Project
submit_atscale_query
Submits the given query against the published project and returns the results in a pandas DataFrame.
- Parameters:
- query (str) – The SQL query to submit.
- use_aggs (bool , optional) – Whether to allow the query to use aggs. Defaults to True.
- gen_aggs (bool , optional) – Whether to allow the query to generate aggs. Defaults to True.
- fake_results (bool , optional) – Whether to use fake results, often used to train aggregates with queries that will frequently be used. Defaults to False.
- use_local_cache (bool , optional) – Whether to allow the query to use the local cache. Defaults to True.
- use_aggregate_cache (bool , optional) – Whether to allow the query to use the aggregate cache. Defaults to True.
- timeout (int , optional) – The number of minutes to wait for a response before timing out. Defaults to 10.
- Returns: A pandas DataFrame containing the query results.
- Return type: DataFrame
update_aggregate_feature
Update the metadata for an aggregate feature.
- Parameters:
- feature_name (str) – The query name of the feature to update.
- description (str) – The new description for the feature. Defaults to None to leave unchanged.
- caption (str) – The new caption for the feature. Defaults to None to leave unchanged.
- folder (str) – The new folder to put the feature in. Defaults to None to leave unchanged.
- format_string (Union [enums.FeatureFormattingType , str ]) – The new format string for the feature. Defaults to None to leave unchanged.
- visible (bool , optional) – Whether the feature will be visible to BI tools. Defaults to None to leave unchanged.
- publish (bool) – Whether the updated project should be published. Defaults to True.
update_calculated_column
Updates the SQL expression for a calculated column.
- Parameters:
- dataset_name (str) – The dataset the calculated column exists in.
- column_name (str) – The name of the column.
- expression (str) – The new SQL expression for the column.
- publish (bool) – Whether the updated project should be published. Defaults to True.
update_calculated_feature
Update the metadata for a calculated feature.
- Parameters:
- feature_name (str) – The query name of the feature to update.
- expression (str) – The new expression for the feature. Defaults to None to leave unchanged.
- description (str) – The new description for the feature. Defaults to None to leave unchanged.
- caption (str) – The new caption for the feature. Defaults to None to leave unchanged.
- folder (str) – The new folder to put the feature in. Defaults to None to leave unchanged.
- format_string (Union [enums.FeatureFormattingType , str ]) – The new format string for the feature. Defaults to None to leave unchanged.
- visible (bool) – Whether the updated feature should be visible. Defaults to None to leave unchanged.
- publish (bool) – Whether the updated project should be published. Defaults to True.
update_categorical_feature
Updates the metadata for an existing categorical feature.
- Parameters:
- feature_name (str) – The name of the feature to update.
- description (str , optional) – The new description for the feature. Defaults to None to leave unchanged.
- caption (str , optional) – The new caption for the feature. Defaults to None to leave unchanged.
- folder (str , optional) – The new folder to put the feature in. Defaults to None to leave unchanged.
- publish (bool , optional) – Whether or not the updated project should be published. Defaults to True.
update_dataset
Updates aggregate settings for an existing Dataset in the data model.
- Parameters:
- dataset_name (str) – The display and query name of the dataset to edit
- allow_aggregates (bool , optional) – The new setting for if aggregates are allowed to be built off of this QDS. Defaults to None for no update.
- create_hinted_aggregate (bool , optional) – The setting for if an aggregate table is generated for all measures and keys in this QDS. Defaults to None for no update.
- incremental_indicator (string , optional) – The indicator column for incremental builds. Defaults to None for no update.
- grace_period (int , optional) – The grace period for incremental builds. Defaults to None for no update.
- safe_to_join_to_incremental (bool , optional) – Whether it is safe to join from this dataset to one with incremental builds enabled. Defaults to None for no update.
- create_fact_from_dimension (bool , optional) – Whether to create a fact dataset if the current dataset is only used with dimensions. Defaults to False.
- publish (bool , optional) – Whether or not the updated project should be published. Defaults to True.
update_dimension
Update the metadata for the given dimension
- Parameters:
- dimension_name (str) – The name of the dimension to update.
- description (str , optional) – The new description for the dimension. Defaults to None to not change.
- publish (bool , optional) – Whether the updated project should be published. Defaults to True.
update_hierarchy
Update the metadata for the given hierarchy
- Parameters:
- hierarchy_name (str) – The name of the hierarchy to update.
- caption (str , optional) – The new caption for the hierarchy. Defaults to None to not change.
- description (str , optional) – The new description for the hierarchy. Defaults to None to not change.
- folder (str , optional) – The new folder for the hierarchy. Defaults to None to not change.
- default_member_expression (str , optional) – The expression for the default member. Defaults to None to not change.
- publish (bool , optional) – Whether the updated project should be published. Defaults to True.
update_perspective
Updates a perspective to hide the inputs. All items to hide should be in the inputs even if previously hidden.
- Parameters:
- perspective_name (str) – The name of the perspective to update.
- dimensions (List *[*str ] , optional) – Dimensions to hide. Defaults to None.
- hierarchies (List *[*str ] , optional) – Query names of hierarchies to hide. Defaults to None.
- categorical_features (List *[*str ] , optional) – Query names of categorical features to hide. Defaults to None.
- numeric_features (List *[*str ] , optional) – Query names of numeric features to hide. Defaults to None.
- publish (bool , optional) – Whether to publish the updated project. Defaults to True.
- Returns: The DataModel object for the updated perspective.
- Return type: DataModel
validate_mdx
Verifies if the given MDX Expression is valid for the current data model.
- Parameters: expression (str) – The MDX expression for the feature.
- Returns: Returns True if mdx is valid.
- Return type: bool
write_feature_importance
Writes the dataframe with columns containing feature query names and their importances to a table in the database accessed by dbconn with the given table_name. Then builds the created table into the data model so the importances can be queried.
- Parameters:
- dbconn (SQLConnection) – connection to the database; should be the same one the model and project are based on
- table_name (str) – the name for the table to be created for the given DataFrame
- dataframe (pd.DataFrame) – the pandas DataFrame to write to the database
- feature_name_prefix (str) – string to prepend to new feature query names to make them easily identifiable
- folder (str) – The folder to put the newly created items in. Defaults to None.
- publish (bool , optional) – Whether or not the updated project should be published. Defaults to True.
- if_exists (enums.TableExistsAction , optional) – What to do if a table with table_name already exists. Defaults to enums.TableExistsAction.Error.
- warehouse_id (str , optional) – The id of the warehouse at which the data model and this dataset point. Defaults to None to use the warehouse_id of existing datasets in the model.
- check_permissions (bool , optional) – Whether to error if the atscale warehouse connection does not have the select privileges on the new table. Defaults to True.
writeback
Writes the dataframe to a table in the database accessed by dbconn with the given table_name. Joins that table to this DataModel by joining on the given join_features or join_columns.
- Parameters:
- dbconn (SQLConnection) – connection to the database; should be the same one the model and project are based on
- table_name (str) – the name for the table to be created for the given DataFrame
- dataframe (pd.DataFrame) – the pandas DataFrame to write to the database
- join_features (list) – a list of feature query names in the data model to use for joining.
- join_columns (list , optional) – The column names in the dataframe to join to the join_features. List must be either None or the same length and order as join_features. Defaults to None to use identical names to the join_features. If multiple columns are needed for a single join they should be in a nested list
- roleplay_features (list , optional) – The roleplays to use on the relationships. List must be either None or the same length and order as join_features. Use ‘’ to not roleplay that relationship. Defaults to None.
- publish (bool , optional) – Whether or not the updated project should be published. Defaults to True.
- if_exists (enums.TableExistsAction , optional) – What to do if a table with table_name already exists. Defaults to enums.TableExistsAction.ERROR.
- warehouse_id (str , optional) – The id of the warehouse at which the data model and this dataset point. Defaults to None to use the warehouse_id of existing datasets in the model.
- check_permissions (bool , optional) – Whether to error if the atscale warehouse connection does not have the select privileges on the new table. Defaults to True.
writeback_spark
Writes the pyspark dataframe to a table in the database accessed via jdbc with the given table_name. Joins that table to this DataModel by joining on the given join_features or join_columns.
- Parameters:
- pyspark_dataframe (pyspark.sql.dataframe.DataFrame) – The pyspark dataframe to write
- schema (str) – The name of the schema (second part of the three part name) for the table to be created in for the given PySpark DataFrame.
- table_name (str) – The name for the table to be created for the given PySpark DataFrame.
- join_features (list) – a list of feature query names in the data model to use for joining.
- join_columns (list , optional) – The columns in the dataframe to join to the join_features. List must be either None or the same length and order as join_features. Defaults to None to use identical names to the join_features. If multiple columns are needed for a single join they should be in a nested list
- roleplay_features (list , optional) – The roleplays to use on the relationships. List must be either None or the same length and order as join_features. Use ‘’ to not roleplay that relationship. Defaults to None.
- warehouse_id (str , optional) – The warehouse id to use which points at the warehouse of dbconn and that the data model points at. Defaults to None, to use the warehouse previously used in the data model.
- database (str , optional) – The name of the database (first part of the three part name if applicable) for the table to be created in for the given PySpark DataFrame. Defaults to None.
- publish (bool , optional) – Whether or not the updated project should be published. Defaults to True.
- if_exists (enums.TableExistsAction , optional) – What to do if a table with table_name already exists. Defaults to enums.TableExistsAction.ERROR.
- check_permissions (bool , optional) – Whether to error if the atscale warehouse connection does not have the select privileges on the new table. Defaults to True.
writeback_spark_jdbc
Writes the pyspark dataframe to a table in the database accessed via jdbc with the given table_name. Joins that table to this DataModel by joining on the given join_features or join_columns.
- Parameters:
- dbconn (SQLConnection) – connection to the database; should be the same one the model and project are based on
- pyspark_dataframe (pyspark.sql.dataframe.DataFrame) – The pyspark dataframe to write
- jdbc_format (str) – the driver class name. For example: ‘jdbc’, ‘net.snowflake.spark.snowflake’, ‘com.databricks.spark.redshift’
- jdbc_options (Dict *[*str *,*str ]) – Case-insensitive to specify connection options for jdbc. The query option is dynamically generated by AtScale, as a result including a table or query parameter can cause issues.
- join_features (list) – a list of feature query names in the data model to use for joining.
- join_columns (list , optional) – The columns in the dataframe to join to the join_features. List must be either None or the same length and order as join_features. Defaults to None to use identical names to the join_features. If multiple columns are needed for a single join they should be in a nested list
- roleplay_features (list , optional) – The roleplays to use on the relationships. List must be either None or the same length and order as join_features. Use ‘’ to not roleplay that relationship. Defaults to None.
- table_name (str , optional) – The name for the table to be created for the given PySpark DataFrame. Can be none if name specified in options
- warehouse_id (str , optional) – The warehouse the data model points at and to use for the writeback. Defaults to None to use the warehouse used in the project already.
- publish (bool , optional) – Whether the updated project should be published. Defaults to True.
- if_exists (enums.TableExistsAction , optional) – What to do if a table with table_name already exists. Defaults to enums.TableExistsAction.ERROR.
- check_permissions (bool , optional) – Whether to error if the atscale warehouse connection does not have the select privileges on the new table. Defaults to True.