Skip to main content

Adding Databricks Data Warehouses

A Databricks data warehouse contains the tables and views that you want to import as fact datasets and dimension datasets into AtScale models. It also contains aggregate table instances that either you or the AtScale engine creates in a database that you specify.

info

To configure a Databricks data warehouse, you must have a special AtScale license. Ask AtScale Support about obtaining a Databricks-enabled license.

Prerequisites

AtScale prerequisites

  • You must have the datawarehouses_admin or datawarehouses_manage realm role assigned in the Identity Broker.
  • You must know the schema to use for building aggregate tables in the data warehouse. AtScale reads and writes aggregate data to this schema, so the AtScale service account user must have ALL privileges for it. BI tool user accounts should not have the select permission.

Databricks prerequisites

  • Use a supported version of Databricks. For more information, see Supported Tools and Platforms.
  • When using Databricks SQL or Unity Catalog, make sure they are configured as needed. For details, see Databricks SQL guide and What is Unity Catalog.
  • Since Databricks does not support JDBC authentication with username and password, you have to generate a personal access token. Databricks recommends using access tokens belonging to service principals; for more information about obtaining them, see Manage service principals. When creating the Databricks connection (described below), use the token as the password, and token as the username.
Note

Alternatively, you can use Databricks Partner Connect. This way Databricks provisions the required resources like a service principal, a personal access token, a SQL warehouse on your behalf. For more information, see Databricks' Atscale Partner Connect documentation in AWS or Azure.

Add a Databricks data warehouse

Important

AtScale does not support impersonation for Databricks data warehouses. When adding a Databricks data warehouse, do not select the Enable to impersonate the client when connecting to data warehouse option.

  1. In Design Center, open the Datasources panel.

  2. Click the + icon and select Databricks SQL.

    The Add Data Warehouse menu of the Datasources panel.

    The Add Data Warehouse panel opens.

  3. Enter a unique Name for the data warehouse. AtScale displays this name in the Design Center and uses it in log files.

  4. Enter the External connection ID for the data warehouse. You can optionally enable the Override generated value? toggle to default this value to the name of the data warehouse.

  5. Enter the name of the Aggregate Catalog, which AtScale uses when creating aggregate tables. You must create this; AtScale does not create one automatically. Also, be aware of the following:

    • If the catalog name contains non-alphabet characters, you should surround the name with backtick characters (`).
    • AtScale supports Databricks SQL clusters with both Unity Catalog enabled and disabled, and thus three-part namespace support is enabled for Databricks SQL.
  6. Enter the name of the Aggregate Schema for AtScale to use when creating aggregate tables. You must create this schema; AtScale does not create one automatically.

    If Unity Catalog is disabled, the aggregate schema will reside in the default catalog hive_metastore. If Unity Catalog is enabled, it's the user's choice to decide in which Catalog-->Schema the aggregates can be stored.

  7. In the Access Controls section, add the users and groups that you want to be able to access the data warehouse and its data. For more information on how data warehouse security works in AtScale, see About Data Warehouse Security.

  8. Add a connection to the data warehouse. You must have at least one defined to finish adding the data warehouse.

    1. In the Connections section, click Connection Required. The Add Connection panel opens.

    2. Complete the following fields:

      • Name: The name of the connection.
      • Host: The location of the Databricks server.
      • Port: The port number to route transmitted data.
    3. In the Authorization section, complete the following fields:

      • Authentication Method: Select Password.
      • Username: Enter token.
      • Password: Enter your Databricks personal access token (PAT).
    4. (Optional) In the Additional Settings section:

      1. Access your Databricks cluster and make sure the following flags are set: transportMode, ssl, httpPath, AuthMech, UID, PWD, and (optionally) UseNativeQuery=1.
      2. In the Extra jdbc flags field, enter all of the above flags for the connection (except httpPath, which will be set in the next step). For example: transportMode=http;ssl=1;AuthMech=3;UID=token;PWD=<tokenID>;UseNativeQuery=1
      3. In the Http path field, enter the httpPath flag. For example: httpPath=<path>
    5. Click Test connection to test the connection.

    6. Click Add.

      The connection is added to the data warehouse. You can add more connections as needed.

  9. (Optional) If your data warehouse has multiple connections, you can configure their query mappings. This determines which types of queries are permitted per connection. A query role can be assigned to only one connection at a time.

    In the Query Mapping section, select the connection you want to assign to each type of query:

    • Small Interactive Queries: Small user queries. These are usually BI queries that can be optimized by AtScale aggregates or ones that involve low-cardinality dimensions.
    • Large Interactive Queries: Large user queries. These are usually BI queries involving full scans of the raw fact table or ones that involve high-cardinality dimensions.
    • Aggregate Build Queries: Queries made when building aggregate tables. This mapping should be assigned to a connection that has more compute power, and that is not subject to resource limitations that would prevent it from building aggregate tables.
    • System Queries: AtScale system processing, such as running aggregate maintenance jobs and querying system statistics.
  10. Click Save Data Warehouse.

Additional information

For additional information on what you can do with AtScale and Databricks, see the Using AtScale with Databricks Genie for Semantic Analytics at Scale video.