Fleetclusters for Databricks + AWS to reduce Costs.

Show me the money. That’s what it’s all about. I have a question for you, to tickle your ears and mind. Get you out of that humdrum funk you are in. Here is my question, riddle me this all you hobbits.

“Of what use is, and what good does the best and most advanced architecture provide if your bosses boss starts to look at the cost and starts to ask questions?”

The thing is, as engineers and developers we love to do fun stuff. We follow the crowds, don’t we? We are like those lemmings, we just do what is cool.

One thing is for sure, I love Databricks and Delta Lake. As someone who’s been working hands-on with data for a decade, the power and simplicity of the Platforms you can build with Databricks is mind-blowing and game-changing. But, there is always a dark side to all things.

Cost. That is the Achilles Heel of Databricks-based Data Platforms. You have to be wise in what you do. Too much All-Purpose Cluster and Notebook use? Your costs will double. Using a bunch of Photon i instances? Better make sure the runtime improvements make up for the cost … cause you gonna pay for it, my friend. Not good with partitions? You will pay for in queries.

Fleet Clusters on AWS + Databricks to save money.

Today I want to talk about fleet-clusters and give you a quick rundown of what they are and why you should think about adding them to your Databricks Data Platform. I will keep it short and sweet, or try anyways.

What are Fleet Clusters?

You can use AWS Spot instances at the lowest cost possible, all while ensuring availability, thanks to Databricks’ ability to flex across multiple instances in a single cluster. ” – Databricks

What it boils down to is this.

There are two types of EC2 instances that get used a lot on AWS. Spot and OnDemand. Why is this little distinction important?

Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud and are available at up to a 90% discount compared to On-Demand prices.” – aws

This has a huge impact on the $$$$ cost $$$$ of a Databricks-based data platform.

If you are a small company or a Data Team that cares about cost you have a few options when configuring clusters …

You probably use the Databricks API and use JSON to set up your clusters. Maybe part of your configuration looks like this.

The problem is you have to specify an exact instance you want, like r5.8xlarge or something else. You are cost-conscious so you ask for either SPOT only or SPOT_WITH_FALLBACK . But, what’s the chance that your Spark cluster with like 15 nodes has SPOT availability every day for all those 15 nodes when you use the cluster all the time?

Probably not so good. You can bet your bottom dollar that on any given day at least 1/4 of your instances are On-Demand. Remember the pricing between SPOT and ON-DEMAND above? It’s a big difference.

Especially when you are trying to do PHOTON powered stuff, those flipping things at Production usage end up being clusters that are %50 or more ON_DEMAND. Dang it! All those sweet PHOTON features you wanted to implement and make your pipelines run blazingly fast … well … guess what?

Things run fast but your costs go through the roof! Photon ON_DEMAND all the time will eat your lunch for you.

Enter fleet clusters.

This is where fleet comes to save us all from ourselves. Instead of saying I want r5.8xlarge you can say I want r-fleet.8xlarge. And of course, this applies to more than just r instances. You can basically use the fleet clusters to stay on SPOT instances as much as possible since you can basically mix-and-match your different instance sizes inside a cluster.

The best part about this all is that there are very few changes needed on your end. You will have to make these in-line policy changes to your AWS IAM role used for your Databricks account. Add all the below permissions or you will get errors trying to launch fleet clusters.

Of course, time will tell the true answer of how well this works. Once everyone starts using it because it works.

Have you used `fleet` clusters in your AWS + Databricks setup? Have you seen cost savings? Let us know!