Adding Databricks Data Warehouses
A Databricks data warehouse contains the tables and views that you want to import as fact datasets and dimension datasets into AtScale cubes. It also contains aggregate-table instances that either you or the AtScale engine creates in a database that you specify.
info
> >
Attention: To configure a Databricks data warehouse, you must have a special AtScale license. Ask your sales representative about obtaining a Databricks-enabled license.
Before you begin
-
Ensure that your user ID in the Design Center is assigned the Super User role or is assigned the Manage Data Warehouses role permission. For details, see Assigning Individual Users to Roles.
-
Generate a personal access token
Databricks does not support JDBC authentication with username and password. Prior to creating a connection you have to generate a personal access token. For details, see Microsoft documentation or Azure documentation.
When creating the Databricks connection (see below), use the token as the password, and
token
as the username. -
Only for Databricks Spark: You must manually install the AtScale UDAFs. For details, see Customer-Managed UDAF Installation on a Databricks Cluster.
Add the Data Warehouse
-
Go to Settings > Data Warehouses and click Create Data Warehouse.
-
In the Add a Data Warehouse dialog box, select one of the following for Type of Data Warehouse:
- Databricks Spark when using the Databricks Runtime.
- Databricks SQL when using the Photon Runtime with a Databricks SQL warehouse.
-
Enter a unique name for the data warehouse. AtScale displays this name in the Design Center and uses it in log files.
-
Enter the External Connection ID.
This input will be inherited by the Name value, but can be overridden by enabling the Override generated value slider.
-
Enter the name of the Database for AtScale to use when creating aggregate tables. (You must create this database; AtScale does not automatically create a database if one does not exist.)
-
Specify the Data Warehouse as a Read-only source.
- Select this option if AtScale will be configured to access this database with credentials that do not have permission to create new tables, or if you need to prevent AtScale from placing any aggregate tables in this data warehouse.
- If Read-only source is enabled upon editing an existing data warehouse with aggregates present, all existing aggregates stored in this data warehouse will be deactivated.
-
Enter the name of the Aggregate Schema.
AtScale reads and writes aggregate data to this database. The AtScale service account user must have ALL privileges for this database. BI tool user accounts should not have the select permission for this database.
-
(Only for Databricks Spark) In the User Defined Aggregate Functions section, select the Custom Function Installation Mode:
- Customer Managed - If selected, the Databricks administrator installs the custom functions in a designated schema.
- None - Available for certain management edge-cases. Do not use this settings when servicing end-user queries because system performance will be degraded.
-
In the Table storage formats section, specify the storage format for new tables (if you have a preference).
Select default to let the AtScale engine use the format configured in the
aggregates.tableConfig.preferredStorageFormat
engine setting. For more information, see Other Settings for Both System-Defined and User-Defined Aggregates. -
If you need to use objects in DBFS, enter the URI, user name, and password in the DBFS file system section.
-
Click Save to complete the setup.
Add a Databricks Connection
After setting up your data warehouse, you must add a connection to the data warehouse before you can run queries. You can establish multiple connections with different configurations to the same data warehouse. If multiple connections are configured, you can assign which types of queries are permitted per connection; for details, see the Query Roles section below.
-
Expand the drop-down adjacent to the listed data warehouse and click Create Connection (or Add Databricks Connection if you have other existing connections).
-
Enter the Name of your Databricks connection.
-
Enter the Host location of the Databricks server.
-
For Port, enter the port number to route transmitted data. The default port number is 443.
-
Enter the Username for connecting to the database.
-
Enter the user Password for connecting to the database.
-
In the Extra JDBC Flags field, enter the JDBC flags that are needed for the connection:
- Access your Databricks cluster to retrieve your JDBC information.
- Make sure all of the following flags are set:
transportMode
,ssl
,httpPath
,AuthMech
,UID
,PWD
. - Optionally, set the
UseNativeQuery=1
flag. - Here is an example for setting the flags:
transportMode=http;ssl=1;httpPath=<path>;AuthMech=3;UID=token;PWD=<tokenID>;UseNativeQuery=1
.
-
Test the connection.
-
Click Save to complete the setup.
Query Roles
If multiple Databricks connections are configured, you can toggle which types of queries (Large Interactive, Small Interactive or System) are permitted per Databricks connection. A query role can be assigned to only one connection at a time.
Expand the drop-down for the Databricks data warehouse to choose the allowed query roles for each connection:
- Large Interactive Queries: Use this engine for large user queries. These are usually BI queries involving full scans of the raw fact table or ones that involve high-cardinality dimensions.
- Small Interactive Queries: Use this engine for small user queries. These are usually BI queries that can be optimized by AtScale aggregates or ones that involve low-cardinality dimensions.
- System Queries: Use this engine for AtScale system processing, such as building aggregate tables or querying system statistics. The System query role user must not be subject to resource limitation rules that would prevent it from building aggregate tables.