Skip to main content

Announcing new identity and access management enhancements to make Databricks readily accessible to your users

Best practices to scale Databricks for your enterprise
Siddharth Bhai
Lei Ni
Kelly Albano
Anna Shrestinian
Share this post

We are excited to share new identity and access management features to help simplify the set-up and scale of Databricks for admins. Unity Catalog is at the center of governance on the Databricks Data Intelligence Platform. Part of Unity Catalog is our identity and access management capabilities, designed with the following principles:

  1. Build secure, scalable, and ubiquitous identity and access management for onboarding, managing, and collaborating.
  2. Enable customers to easily control access to Databricks using intuitive, extensible, and audit-ready permissions.
  3. Develop world-class, highly scalable authentication for browser and API access to enable customers and partners to simply and securely leverage the power of the Databricks Data Intelligence Platform.

In this blog, we'll provide a refresher on existing identity and access management features and introduce new investments to simplify the Databricks admin experience. These investments include simple logins from Power BI and Tableau, simplified single sign-on setup via unified login, OAuth authorization, and running jobs using the identity of a service principal as a security best practice.

Seamlessly connect Power BI and Tableau to Databricks on AWS using single sign-on

Power BI and Tableau are two of the most popular third party data tools on Databricks. The ability to securely connect from Power BI and Tableau to Databricks with single sign-on is now generally available on AWS. Databricks leverages OAuth to allow users to access Databricks from these tools using single sign-on. This simplifies login for users and reduces the risk of leaked credentials. OAuth partner applications for Power BI and Tableau are enabled in your account by default.

To get started, check out our docs page or watch this demo video for Power BI.

Authenticate users to Databricks using unified login on AWS

Single sign-on (SSO) is a key security best practice and allows you to authenticate your users to Databricks using your preferred identity provider. At Databricks, we offer SSO across all three clouds. On Azure and GCP, we offer SSO for your account and workspaces by default in the form of Microsoft Entra ID (formerly Azure Active Directory) and Google Cloud Identity, respectively. On AWS, Databricks offers support for a variety of identity providers such as Okta, Microsoft Entra ID, and OneLogin using either SAML or OIDC.

This summer, we introduced unified login, a new feature that simplifies SSO for Databricks on AWS accounts and workspaces. Unified login allows you to manage one SSO configuration in your account and every Databricks workspace associated with it. With Single Sign-On (SSO) activated on your account, you can enable unified login for all or specific workspaces. This setup uses an account-level SSO configuration for Databricks access, simplifying user authentication across your account's workspaces. Unified Login is in use on thousands of workspaces in production already.

ul

Unified login is GA and enabled automatically on accounts created after June 21, 2023. The feature is in public preview for accounts created before June 21, 2023. To enable unified login, see set up SSO in your Databricks account console.

Automate service principal access to Databricks with OAuth on AWS

We are excited to announce that OAuth for service principals is generally available on AWS. On Azure and GCP, we support OAuth via Azure and Google tokens, respectively. Service principals are Databricks identities for use with automated tools, jobs, and applications. It is a security best practice to use service principals instead of users for production automation workflows for the following reasons:

  • Production workflows that run using service principals are not impacted when users leave the organization or change roles.
  • If all processes that act on production data run using service principals, interactive users do not need any write, delete, or modify privileges in production. This eliminates the risk of a user overwriting production data by accident.
  • Using service principals for automated workflows enables users to better protect their own access tokens.

OAuth is an open standard protocol that authorizes users and service accounts to APIs and other resources without revealing the credentials. OAuth for service principals uses the OAuth client credentials flow to generate OAuth access tokens that can be used to authenticate to Databricks APIs. OAuth for service principals has the following benefits for authenticating to Databricks:

  • Uses Databricks service principals, instead of users, for authentication.
  • Uses short-lived (one-hour) access tokens for credentials to reduce the risk of credentials being leaked.
  • Expired OAuth access token can be automatically refreshed using Databricks tools and SDKs.
  • Can authenticate to all Databricks APIs that the service principal has access to, at both the account and workspace level. This enables you to automate the creation and setup of workspaces in one script.

To use OAuth for service principals, see Authentication using OAuth for service principals.

Securely run Databricks jobs as a Service Principal

Databricks Workflows orchestrates data processing, machine learning, and analytics pipelines in the Databricks Data Intelligence Platform. A Databricks job is a way to run your data processing and analysis applications in a Databricks workspace. By default, jobs run as the identity of the job owner. This means that the job assumes the permissions of the job owner and can only access data and Databricks objects that the job owner has permission to access.

We are excited to announce that you can now change the identity that the job is running as to a service principal. This means that the job assumes the permissions of that service principal instead of the owner and ensures that the job will not be affected by a user leaving your organization or switching departments. Running a job as a service principal is generally available on AWS, Azure, and GCP. Check out Run a job as a service principal in the docs to get started.

"Running Databricks workflows using service principals allows us to separate the workflows permissions, their execution, and their lifecycle from users, and therefore making them more secure and robust"
— George Moldovan, Product Owner, Raiffeisen Bank International

Identity and Access Management Best Practices on Databricks

At Databricks, we are committed to scaling with you as your organization grows. We covered a lot in today's blog, highlighting our key investments in our identity and access management platform via Unity Catalog on Databricks. With a slew of new identity and access management features now available, you might wonder what "good" looks like as you build your data governance strategy with Databricks.

We recommend you check out our identity and access management docs pages for the latest best practices (AWS | Azure | GCP) or watch our Data + AI Summit 2023 session "Best Practices for Setting Up Databricks SQL at Enterprise Scale".

Try Databricks for free

Related posts

See all Platform Blog posts