data_model

class atscale.data_model.data_model.DataModel

Creates an object corresponding to an AtScale Data Model. Takes an existing model id and AtScale Catalog object to construct an object that deals with functionality related to model datasets and columns, as well as reading data.

property caption : str

Getter for the caption instance variable

Returns: The caption of this model
Return type: str

property catalog : Catalog

Getter for the catalog instance variable.

Returns: The Catalog object this model belongs to.
Return type: Catalog

column_exists

Checks if the given column name exists in the dataset.

Parameters:
- dataset_name (str) – the name of the dataset we pull the columns from, case-sensitive.
- column_name (str) – the name of the column to check, case-sensitive
Returns: true if name found, else false.
Return type: bool

dataset_exists

Returns whether a given dataset_name exists in the data model, case-sensitive.

Parameters: dataset_name (str) – the name of the dataset to try and find
Returns: true if name found, else false.
Return type: bool

generate_time_series_features

Generates time series features off of the data model, like rolling statistics and period to date for the given numeric features using the time hierarchy from the given data model. The core of the function is built around the groupby function, like so:

dataframe[groupby(group_features + hierarchy_levels)][shift(shift_amount)][rolling(interval)][{aggregate function}]

Parameters:
- dataframe (pandas.DataFrame) – the pandas dataframe with the features.
- numeric_features (List *[*str ]) – The list of numeric feature query names to build time series features of.
- time_hierarchy (str) – The query names of the time hierarchy to use to derive features.
- level (str) – The query name of the level within the time hierarchy to derive the features at.
- group_features (List *[*str ] , optional) – The list of features to group by. Note that this acts as a logical grouping as opposed to a dimensionality reduction when paired with shifts or intervals. Defaults to None.
- intervals (List *[*int ] , optional) – The intervals to create the features over. Will use default values based on the time step of the given level if None. Defaults to None.
- shift_amount (int , optional) – The amount of rows to shift the new features. Defaults to 0.
Returns: A DataFrame containing the original columns and the newly generated ones
Return type: DataFrame

get_all_categorical_feature_names

Returns a list of all published categorical features (ie Hierarchy levels and secondary_attributes) in the given DataModel.

Parameters: folder (str , optional) – The name of a folder in the DataModel containing features to exclusively list. Defaults to None to not filter by folder.
Returns: A list of the query names of categorical features in the DataModel and, if given, in the folder.
Return type: List[str]

get_all_numeric_feature_names

Returns a list of all published numeric features (ie Aggregate and Calculated Measures) in the data model.

Parameters: folder (str , optional) – The name of a folder in the data model containing measures to exclusively list. Defaults to None to not filter by folder.
Returns: A list of the query names of numeric features in the data model and, if given, in the folder.
Return type: List[str]

get_columns

Gets all currently visible columns in a given dataset, case-sensitive.

Parameters: dataset_name (str) – the name of the dataset to get columns from, case-sensitive.
Returns: the columns in the given dataset
Return type: Dict

get_connected_warehouse

Returns the warehouse info utilized in this data_model

Returns: A dictionary describing the connected warehouse
Return type: Dict

get_data

Submits a query against the data model using the supplied information and returns the results in a pandas DataFrame. Be sure that values passed to filters match the data type of the feature being filtered. Decimal precision in returned numeric features may differ from other variations of the get_data function.

Parameters:
- feature_list (List *[*str ]) – The list of feature query names to query.
- filter_equals (Dict *[*str , Any ] , optional) – Filters results based on the feature equaling the value. Defaults to None.
- filter_greater (Dict *[*str , Any ] , optional) – Filters results based on the feature being greater than the value. Defaults to None.
- filter_less (Dict *[*str , Any ] , optional) – Filters results based on the feature being less than the value. Defaults to None.
- filter_greater_or_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature being greater or equaling the value. Defaults to None.
- filter_less_or_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature being less or equaling the value. Defaults to None.
- filter_not_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature not equaling the value. Defaults to None.
- filter_in (Dict *[*str , list ] , optional) – Filters results based on the feature being contained in the values. Defaults to None.
- filter_not_in (Dict *[*str , list ] , optional) – Filters results based on the feature not being contained in the values. Defaults to None.
- filter_between (Dict *[*str , tuple ] , optional) – Filters results based on the feature being between the values. Defaults to None.
- filter_like (Dict *[*str , str ] , optional) – Filters results based on the feature being like the clause. Defaults to None.
- filter_not_like (Dict *[*str , str ] , optional) – Filters results based on the feature not being like the clause. Defaults to None.
- filter_rlike (Dict *[*str , str ] , optional) – Filters results based on the feature being matched by the regular expression. Defaults to None.
- filter_null (List *[*str ] , optional) – Filters results to show null values of the specified features. Defaults to None.
- filter_not_null (List *[*str ] , optional) – Filters results to exclude null values of the specified features. Defaults to None.
- order_by (List *[*Tuple *[*str , str ] ]) – The sort order for the returned dataframe. Accepts a list of tuples of the feature query name and ordering respectively: [(‘feature_name_1’, ‘DESC’), (‘feature_2’, ‘ASC’) …]. Defaults to None for AtScale Engine default sorting.
- limit (int , optional) – Limit the number of results. Defaults to None for no limit.
- comment (str , optional) – A comment string to build into the query. Defaults to None for no comment.
- use_aggs (bool , optional) – Whether to allow the query to use aggs. Defaults to True.
- gen_aggs (bool , optional) – Whether to allow the query to generate aggs. Defaults to True.
- fake_results (bool , optional) – Whether to use fake results, often used to train aggregates with queries that will frequently be used. Defaults to False.
- use_local_cache (bool , optional) – Whether to allow the query to use the local cache. Defaults to True.
- use_aggregate_cache (bool , optional) – Whether to allow the query to use the aggregate cache. Defaults to True.
- timeout (int , optional) – The number of minutes to wait for a response before timing out. Defaults to 10.
- use_postgres (bool , optional) – Whether to use Postgres dialect for inbound query. Defaults to True.
Returns: A pandas DataFrame containing the query results.
Return type: DataFrame

get_data_direct

Generates an AtScale query against the data model to get the given features, translates it to a database query, and submits it directly to the database using the SQLConnection. The results are returned as a Pandas DataFrame. Be sure that values passed to filters match the data type of the feature being filtered.Decimal precision in returned numeric features may differ from other variations of the get_data function.

Parameters:
- dbconn (SQLConnection) – The connection to use to submit the query to the database.
- feature_list (List *[*str ]) – The list of feature query names to query.
- filter_equals (Dict *[*str , Any ] , optional) – A dictionary of features to filter for equality to the value. Defaults to None.
- filter_greater (Dict *[*str , Any ] , optional) – A dictionary of features to filter greater than the value. Defaults to None.
- filter_less (Dict *[*str , Any ] , optional) – A dictionary of features to filter less than the value. Defaults to None.
- filter_greater_or_equal (Dict *[*str , Any ] , optional) – A dictionary of features to filter greater than or equal to the value. Defaults to None.
- filter_less_or_equal (Dict *[*str , Any ] , optional) – A dictionary of features to filter less than or equal to the value. Defaults to None.
- filter_not_equal (Dict *[*str , Any ] , optional) – A dictionary of features to filter not equal to the value. Defaults to None.
- filter_in (Dict *[*str , list ] , optional) – A dictionary of features to filter in a list. Defaults to None.
- filter_not_in (Dict *[*str , list ] , optional) – Filters results based on the feature not being contained in the values. Defaults to None.
- filter_between (Dict *[*str , tuple ] , optional) – A dictionary of features to filter between the tuple values. Defaults to None.
- filter_like (Dict *[*str , str ] , optional) – A dictionary of features to filter like the value. Defaults to None.
- filter_not_like (Dict *[*str , str ] , optional) – Filters results based on the feature not being like the clause. Defaults to None.
- filter_rlike (Dict *[*str , str ] , optional) – A dictionary of features to filter rlike the value. Defaults to None.
- filter_null (List *[*str ] , optional) – A list of features to filter for null. Defaults to None.
- filter_not_null (List *[*str ] , optional) – A list of features to filter for not null. Defaults to None.
- order_by (List *[*Tuple *[*str , str ] ]) – The sort order for the returned dataframe. Accepts a list of tuples of the feature query name and ordering respectively: [(‘feature_name_1’, ‘DESC’), (‘feature_2’, ‘ASC’) …]. Defaults to None for AtScale Engine default sorting.
- limit (int , optional) – A limit to put on the query. Defaults to None.
- comment (str , optional) – A comment to put in the query. Defaults to None.
- use_aggs (bool , optional) – Whether to allow the query to use aggs. Defaults to True.
- gen_aggs (bool , optional) – Whether to allow the query to generate aggs. Defaults to True.
Returns: The results of the query as a DataFrame
Return type: DataFrame

get_data_spark

Uses the provided spark_session to execute a query generated by the AtScale query engine against the data model. Returns the results in a spark DataFrame. Be sure that values passed to filters match the data type of the feature being filtered. Decimal precision in returned numeric features may differ from other variations of the get_data function.

Parameters:
- feature_list (List *[*str ]) – The list of feature query names to query.
- spark_session (pyspark.sql.SparkSession) – The pyspark SparkSession to execute the query with
- filter_equals (Dict *[*str , Any ] , optional) – Filters results based on the feature equaling the value. Defaults to None.
- filter_greater (Dict *[*str , Any ] , optional) – Filters results based on the feature being greater than the value. Defaults to None.
- filter_less (Dict *[*str , Any ] , optional) – Filters results based on the feature being less than the value. Defaults to None.
- filter_greater_or_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature being greater or equaling the value. Defaults to None.
- filter_less_or_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature being less or equaling the value. Defaults to None.
- filter_not_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature not equaling the value. Defaults to None.
- filter_in (Dict *[*str , list ] , optional) – Filters results based on the feature being contained in the values. Defaults to None.
- filter_not_in (Dict *[*str , list ] , optional) – Filters results based on the feature not being contained in the values. Defaults to None.
- filter_between (Dict *[*str , tuple ] , optional) – Filters results based on the feature being between the values. Defaults to None.
- filter_like (Dict *[*str , str ] , optional) – Filters results based on the feature being like the clause. Defaults to None.
- filter_not_like (Dict *[*str , str ] , optional) – Filters results based on the feature not being like the clause. Defaults to None.
- filter_rlike (Dict *[*str , str ] , optional) – Filters results based on the feature being matched by the regular expression. Defaults to None.
- filter_null (List *[*str ] , optional) – Filters results to show null values of the specified features. Defaults to None.
- filter_not_null (List *[*str ] , optional) – Filters results to exclude null values of the specified features. Defaults to None.
- order_by (List *[*Tuple *[*str , str ] ]) – The sort order for the returned dataframe. Accepts a list of tuples of the feature query name and ordering respectively: [(‘feature_name_1’, ‘DESC’), (‘feature_2’, ‘ASC’) …]. Defaults to None for AtScale Engine default sorting.
- limit (int , optional) – Limit the number of results. Defaults to None for no limit.
- comment (str , optional) – A comment string to build into the query. Defaults to None for no comment.
- use_aggs (bool , optional) – Whether to allow the query to use aggs. Defaults to True.
- gen_aggs (bool , optional) – Whether to allow the query to generate aggs. Defaults to True.
Returns: A pyspark DataFrame containing the query results.
Return type: pyspark.sql.dataframe.DataFrame

get_data_spark_jdbc

Uses the provided information to establish a jdbc connection to the underlying data warehouse. Generates a query against the data model and uses the provided spark_session to execute. Returns the results in a spark DataFrame. Be sure that values passed to filters match the data type of the feature being filtered. Decimal precision in returned numeric features may differ from other variations of the get_data function.

Parameters:
- feature_list (List *[*str ]) – The list of feature query names to query.
- spark_session (pyspark.sql.SparkSession) – The pyspark SparkSession to execute the query with
- jdbc_format (str) – the driver class name. For example: ‘jdbc’, ‘net.snowflake.spark.snowflake’, ‘com.databricks.spark.redshift’
- jdbc_options (Dict *[*str *,*str ]) – Case-insensitive to specify connection options for jdbc
- filter_equals (Dict *[*str , Any ] , optional) – Filters results based on the feature equaling the value. Defaults to None.
- filter_greater (Dict *[*str , Any ] , optional) – Filters results based on the feature being greater than the value. Defaults to None.
- filter_less (Dict *[*str , Any ] , optional) – Filters results based on the feature being less than the value. Defaults to None.
- filter_greater_or_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature being greater or equaling the value. Defaults to None.
- filter_less_or_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature being less or equaling the value. Defaults to None.
- filter_not_equal (Dict *[*str , Any ] , optional) – Filters results based on the feature not equaling the value. Defaults to None.
- filter_in (Dict *[*str , list ] , optional) – Filters results based on the feature being contained in the values. Defaults to None.
- filter_not_in (Dict *[*str , list ] , optional) – Filters results based on the feature not being contained in the values. Defaults to None.
- filter_between (Dict *[*str , tuple ] , optional) – Filters results based on the feature being between the values. Defaults to None.
- filter_like (Dict *[*str , str ] , optional) – Filters results based on the feature being like the clause. Defaults to None.
- filter_not_like (Dict *[*str , str ] , optional) – Filters results based on the feature not being like the clause. Defaults to None.
- filter_rlike (Dict *[*str , str ] , optional) – Filters results based on the feature being matched by the regular expression. Defaults to None.
- filter_null (List *[*str ] , optional) – Filters results to show null values of the specified features. Defaults to None.
- filter_not_null (List *[*str ] , optional) – Filters results to exclude null values of the specified features. Defaults to None.
- order_by (List *[*Tuple *[*str , str ] ]) – The sort order for the returned dataframe. Accepts a list of tuples of the feature query name and ordering respectively: [(‘feature_name_1’, ‘DESC’), (‘feature_2’, ‘ASC’) …]. Defaults to None for AtScale Engine default sorting.
- limit (int , optional) – Limit the number of results. Defaults to None for no limit.
- comment (str , optional) – A comment string to build into the query. Defaults to None for no comment.
- use_aggs (bool , optional) – Whether to allow the query to use aggs. Defaults to True.
- gen_aggs (bool , optional) – Whether to allow the query to generate aggs. Defaults to True.
Returns: A pyspark DataFrame containing the query results.
Return type: pyspark.sql.dataframe.DataFrame

get_database_query

Returns a database query generated using the data model to get the given features. Be sure that values passed to filters match the data type of the feature being filtered.

Parameters:
- feature_list (List *[*str ]) – The list of feature query names to query.
- filter_equals (Dict *[*str , Any ] , optional) – A dictionary of features to filter for equality to the value. Defaults to None.
- filter_greater (Dict *[*str , Any ] , optional) – A dictionary of features to filter greater than the value. Defaults to None.
- filter_less (Dict *[*str , Any ] , optional) – A dictionary of features to filter less than the value. Defaults to None.
- filter_greater_or_equal (Dict *[*str , Any ] , optional) – A dictionary of features to filter greater than or equal to the value. Defaults to None.
- filter_less_or_equal (Dict *[*str , Any ] , optional) – A dictionary of features to filter less than or equal to the value. Defaults to None.
- filter_not_equal (Dict *[*str , Any ] , optional) – A dictionary of features to filter not equal to the value. Defaults to None.
- filter_in (Dict *[*str , list ] , optional) – A dictionary of features to filter in a list. Defaults to None.
- filter_not_in (Dict *[*str , list ] , optional) – A dictionary of features to filter not in a list. Defaults to None.
- filter_between (Dict *[*str , tuple ] , optional) – A dictionary of features to filter between the tuple values. Defaults to None.
- filter_like (Dict *[*str , str ] , optional) – A dictionary of features to filter like the value. Defaults to None.
- filter_not_like (Dict *[*str , str ] , optional) – A dictionary of features to filter not like the value. Defaults to None.
- filter_rlike (Dict *[*str , str ] , optional) – A dictionary of features to filter rlike the value. Defaults to None.
- filter_null (List *[*str ] , optional) – A list of features to filter for null. Defaults to None.
- filter_not_null (List *[*str ] , optional) – A list of features to filter for not null. Defaults to None.
- order_by (List *[*Tuple *[*str , str ] ]) – The sort order for the returned query. Accepts a list of tuples of the feature query name and ordering respectively: [(‘feature_name_1’, ‘DESC’), (‘feature_2’, ‘ASC’) …]. Defaults to None for AtScale Engine default sorting.
- limit (int , optional) – A limit to put on the query. Defaults to None.
- comment (str , optional) – A comment to put in the query. Defaults to None.
- use_aggs (bool , optional) – Whether to allow the query to use aggs. Defaults to True.
- gen_aggs (bool , optional) – Whether to allow the query to generate aggs. Defaults to True.
Returns: The generated database query
Return type: str

get_dataset

Gets the metadata of a dataset.

Parameters: dataset_name (str) – The name of the dataset to pull.
Returns: A dictionary of the metadata for the dataset.
Return type: Dict

get_dataset_names

Gets the name of all datasets currently utilized by the DataModel and returns as a list.

Returns: list of dataset names
Return type: List[str]

get_dimension_dataset_names

Gets the name of all dimension datasets currently utilized by the DataModel and returns as a list.

Returns: list of dimension dataset names
Return type: List[str]

get_dimensions

Gets a dictionary of dictionaries with the published dimension names and metadata.

Returns: A dictionary of dictionaries where the dimension names are the keys in the outer dictionary : while the inner keys are the following: ‘description’, ‘type’(value is Time or Standard).
Return type: Dict

get_fact_dataset_names

Gets the name of all fact datasets currently utilized by the DataModel and returns as a list.

Returns: list of fact dataset names
Return type: List[str]

get_feature_description

Returns the description of a given published feature.

Parameters: feature (str) – The query name of the feature to retrieve the description of.
Returns: The description of the given feature.
Return type: str

get_feature_expression

Returns the expression of a given published feature.

Parameters: feature (str) – The query name of the feature to return the expression of.
Returns: The expression of the given feature.
Return type: str

get_features

Gets the feature names and metadata for each feature in the published DataModel.

Parameters:
- feature_list (List *[*str ] , optional) – A list of feature query names to return. Defaults to None to return all. All features in this list must exist in the model.
- folder_list (List *[*str ] , optional) – A list of folders to filter by. Defaults to None to ignore folder.
- feature_type (enums.FeatureType , optional) – The type of features to filter by. Options include enums.FeatureType.ALL, enums.FeatureType.CATEGORICAL, or enums.FeatureType.NUMERIC. Defaults to ALL.
Returns: A dictionary of dictionaries where the feature names are the keys in the outer dictionary : while the inner keys are the following: ‘data_type’(value is a level-type, ‘Aggregate’, or ‘Calculated’), ‘description’, ‘expression’, caption, ‘folder’, and ‘feature_type’(value is Numeric or Categorical).
Return type: Dict

get_folders

Returns a list of the available folders in the published DataModel.

Returns: A list of the available folders
Return type: List[str]

get_hierarchies

Gets a dictionary of dictionaries with the published hierarchy names and metadata. Secondary attributes are treated as : their own hierarchies, they are hidden by default, but can be shown with the secondary_attribute parameter.

Parameters:
- secondary_attribute (bool , optional) – if we want to filter the secondary attribute field. True will return hierarchies and secondary_attributes, False will return only non-secondary attributes. Defaults to False.
- folder_list (List *[*str ] , optional) – The list of folders in the data model containing hierarchies to exclusively list. Defaults to None to not filter by folder.
Returns: A dictionary of dictionaries where the hierarchy names are the keys in the outer dictionary : while the inner keys are the following: ‘dimension’, ‘description’, ‘caption’, ‘folder’, ‘type’(value is Time or Standard), ‘secondary_attribute’.
Return type: Dict

get_hierarchy_levels

Gets a list of strings for the levels of a given published hierarchy

Parameters: hierarchy_name (str) – The query name of the hierarchy
Returns: A list containing the hierarchy’s levels
Return type: List[str]

get_secondary_attributes_at_level

Gets the secondary attributes that are tied to the provided level

Parameters: level (str) – The level in question
Returns: A list of attribute names
Return type: List[str]

property id : str

Getter for the id instance variable

Returns: The id of this model
Return type: str

is_perspective()

Checks if this DataModel is a perspective

Returns: true if this is a perspective
Return type: bool

property name : str

Getter for the name instance variable. The name of the data model.

Returns: The textual identifier for the data model.
Return type: str

submit_atscale_query

Submits the given query against the data model and returns the results in a pandas DataFrame.

Parameters:
- query (str) – The SQL query to submit.
- use_aggs (bool , optional) – Whether to allow the query to use aggs. Defaults to True.
- gen_aggs (bool , optional) – Whether to allow the query to generate aggs. Defaults to True.
- fake_results (bool , optional) – Whether to use fake results, often used to train aggregates with queries that will frequently be used. Defaults to False.
- use_local_cache (bool , optional) – Whether to allow the query to use the local cache. Defaults to True.
- use_aggregate_cache (bool , optional) – Whether to allow the query to use the aggregate cache. Defaults to True.
- timeout (int , optional) – The number of minutes to wait for a response before timing out. Defaults to 10.
Returns: A pandas DataFrame containing the query results.
Return type: DataFrame

validate_mdx

Verifies if the given MDX Expression is valid for the current data model.

Parameters: expression (str) – The MDX expression for the feature.
Returns: Returns True if mdx is valid.
Return type: bool

class atscale.data_model.data_model.DataModel​

property caption : str​

property catalog : Catalog​

column_exists​

dataset_exists​

generate_time_series_features​

get_all_categorical_feature_names​

get_all_numeric_feature_names​

get_columns​

get_connected_warehouse​

get_data​

get_data_direct​

get_data_spark​

get_data_spark_jdbc​

get_database_query​

get_dataset​

get_dataset_names​

get_dimension_dataset_names​

get_dimensions​

get_fact_dataset_names​

get_feature_description​

get_feature_expression​

get_features​

get_folders​

get_hierarchies​

get_hierarchy_levels​

get_secondary_attributes_at_level​

property id : str​

is_perspective()​

property name : str​

submit_atscale_query​

validate_mdx​