Skip to main content

Adding Amazon Redshift Data Warehouses

An Amazon Redshift data warehouse is an Amazon Redshift cluster and database that contains the tables and views that you want to access as model facts and dimensions. It also contains aggregate table instances that either you or the AtScale engine creates in a schema that you specify.

Prerequisites

Before adding a Redshift data warehouse, ensure you have the following:

  • The datawarehouses_admin role assigned in the Identity Broker.
  • The schema to use for building aggregate tables in the data warehouse. AtScale reads and writes aggregate data to this schema, so the AtScale service account user must have ALL privileges for it. BI tool user accounts should not have the select permission for this schema.
  • If you want to use trigger files to initiate aggregate builds, you must have an S3 bucket for AtScale to connect to. This does not need to be publicly visible, but the AtScale services will require access to it.

Also be aware of the following when using AtScale with a Redshift data warehouse:

  • AtScale routinely updates the CA certificates that it ships with, making it unlikely that you will have to update the AtScale JVM with the Amazon ACM certificates. However, if the certificate changes in the future before you can install another AtScale upgrade, you will need to import the new certificate. For detailed information on Redshift SSL configuration options and certificates, see Configuring security options for connections in the Amazon documentation.
  • If you want to apply at-rest encryption to your data, refer to the Amazon Redshift database encryption documentation.
  • Redshift doesn't support use of LN or LOG functions with NUMERIC or DECIMAL columns, as described in LN function in the Amazon documentation. If this limitation affects your AtScale models, open a ticket with AWS Redshift.

Add an IAM role for AtScale

Before you can connect to a Redshift data warehouse, you need to create an IAM role for AtScale. Use the GRANT command to define the following Redshift privileges:

  • For the aggregate schema: SELECT, UPDATE, CREATE, DROP
  • For all other tables/schemas AtScale will have access to: SELECT

If you plan on connecting an S3 bucket, grant the AtScale IAM role read/write access to it:

{
"Version": "<version>",
"Statement": [
{
"Sid": "ListObjectsInBucket",
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": ["arn:aws:s3:::<bucket-name>"]
},
{
"Sid": "AllObjectActions",
"Effect": "Allow",
"Action": "s3:*Object",
"Resource": ["arn:aws:s3:::<bucket-name>/*"]
}
]
}

Add a Redshift data warehouse

To add a Redshift data warehouse:

  1. In Design Center, open the Datasources panel.

  2. Click the + icon and select Redshift.

    image

    The Add Data Warehouse panel opens.

  3. Enter a Name for the data warehouse.

  4. Enter the External connection ID for the data warehouse. You can optionally enable the Override generated value? toggle to default this value to the name of the data warehouse.

  5. In the Database field, enter the name of the database to use when creating aggregate tables. This must already exist; AtScale does not create it automatically.

  6. Enter the name of the Aggregate Schema to use when creating aggregate tables. This must already exist; AtScale does not create it automatically.

  7. In the Access Controls section, add the users and groups that you want to be able to access the data warehouse and its data. For more information on how data warehouse security works in AtScale, see About Data Warehouse Security.

  8. Add a connection to the data warehouse. You must have at least one defined to finish adding the data warehouse.

    1. In the Connections section, click Connection Required. The Add Connection panel opens.

    2. Complete the following fields:

      • Name: The name of the connection.
      • Host: The location of the Databricks server.
      • Port: The port number to route transmitted data.
      • Username: The username AtScale uses when connecting to the data warehouse.
    3. Select the authentication method to use when connecting to the data warehouse, and enter the required credentials:

      • Use Access Keys: Enter the Access key id and Secret access key. These values are saved in the AtScale database. The access key is saved in plain text, while the secret access key is encrypted.
      • Use Password: Enter the Password. This value is encrypted and saved in the AtScale database.
    4. (Optional) Specify any extra JDBC flags needed to customize the Redshift driver's behavior. For example, see Configuring security options for connections in the Amazon documentation for optional settings you can enter to customize SSL behavior.

    5. (Optional) Click Test Connection to test the connection.

    6. Click Add

      The connection is added to the data warehouse. You can add more connections as needed.

  9. (Optional) If your data warehouse has multiple connections, you can configure their query role mappings. This determines which types of queries are permitted per connection. A query role can be assigned to only one connection at a time.

    In the Query Mapping section, select the connection you want to assign to each type of query:

    • Small Interactive Queries: Small user queries. These are usually BI queries that can be optimized by AtScale aggregates, or ones that involve low-cardinality dimensions.
    • Large Interactive Queries: Large user queries. These are usually BI queries involving full scans of the raw fact table, or ones that involve high-cardinality dimensions.
    • Aggregate Build Queries: Queries made when building aggregate tables. This mapping should be assigned to a connection that has more compute power, and that is not subject to resource limitations that would prevent it from building aggregate tables.
    • System Queries: AtScale system processing, such as running aggregate maintenance jobs and querying system statistics.
  10. Click Save Data Warehouse.