Skip to main content

pca

atscale.eda.pca.pca

Performs principal component analysis (PCA) on the numeric features specified. This is only supported for Snowflake at this time.

  • Parameters:
    • dbconn (Snowflake) – The database connection that pca will interact with
    • data_model (DataModel) – The data model corresponding to the features provided
    • pc_num (int) – The number of principal components to be returned from the analysis. Must be in the range of [1, # of numeric features to be analyzed] (inclusive)
    • numeric_features (List *[*str ]) – The query names of the numeric features to be analyzed via pca
    • granularity_levels (List *[*str ]) – The query names of the categorical features corresponding to the level of granularity desired in numeric_features
    • if_exists (enums.TableExistsAction , optional) – The default action that pca takes when creating a table with a preexisting name. Does not accept APPEND or IGNORE. Defaults to ERROR.
    • write_database (str) – The database that pca will write tables to. Defaults to the database associated with the given dbconn.
    • write_schema (str) – The schema that pca will write tables to. Defaults to the schema associated with the given dbconn.
  • Returns: A pair of Dicts, the first containing the PCs and the second containing : their percent weights
  • Return type: Tuple[DataFrame, DataFrame]