How DoorDash Manages Mobile Releases

November 28, 2023 8 Minute Read Mobile 0

Manolo Sañudo

Manolo is a software engineer on the New User Experience team at DoorDash. His focus is on the consumer iOS apps for Caviar and DoorDash.

A general overview of the release cycle

At a high level, our release cycles are similar to those found at many other companies. Here’s a quick look at how we manage things:

We ship a new app version every week, which means we have a release cut weekly.
Testing begins when a release is cut. At that point, we have up to a week to fix any critical regressions, which we do by cherry-picking fixes into the release branch.
We submit for App Store review after testing and any fixes are complete, ideally mid-week to create a buffer for any potential rejections or unexpectedly long review times.
Once approved, the build waits for release until the end of the week.
On release day, we begin a phased rollout to 1% of users to ensure no other major issues surface that might have been missed during testing.
After closely monitoring release health, we accelerate the rollout to all users on Day 3.
At this point, the subsequent release is already in the works, so it’s a matter of rinse and repeat.

The following sections drill down a bit deeper into our processes, particularly in regards to collaboration and the human aspects of our processes.

Release management responsibilities

A small team of volunteers rotates responsibilities for managing releases and taking on release manager responsibilities. Each on-duty release manager is in charge of making sure that the current release rolls out smoothly and that the subsequent release makes it through App Store approval before its scheduled release date.

We deliberately keep the number of release managers relatively low. We have enough managers to spread the load, but not so many that anyone gets out of practice. Keeping the numbers in check helps to keep decisions consistent, especially with inevitable judgment calls that may occur. Managers not only are kept up-to-date on recent process changes and improvements; they’re part of deciding how those processes should change in the first place.

Managing communication

To maintain clear communication and keep organized throughout releases and rollouts, we create a Slack channel specific to each release — for instance #ios-release-5-0-0. This centralizes release-specific status updates and conversations in one place, reducing noise in other shared channels and making it easy to look up details from a past release if needed.

The dedicated Slack channel is especially helpful when hotfixes enter the mix. Because we have a weekly release schedule, shipping a hotfix for an issue in production means that there are two separate releases in progress simultaneously. Isolating each release’s communication in its own channel prevents confusion. For instance, in the 5.0.0 example, anything related to the 5.0.1 hotfix would be discussed in #ios-release-5-0-0 while matters related to the upcoming 5.1.0 release would be addressed in #ios-release-5-1-0.

Testing release candidates

Our apps have grown so large that it’s impossible for any one person or even a few people to fully own release candidate testing. The team dedicated exclusively to higher-level QA can’t easily manage intensive weekly regression testing, too. Considering the volume and pace of continuous changes and new feature development throughout the app, it’s difficult to ensure that the right things are being tested — and tested correctly. The people actually building features and making changes are in the best position to know what’s new, what’s different, and how to test it all properly.

That’s why our release candidate testing relies on a group of engineers we call “component owners.” Each of them is responsible for testing their own component — a feature or area of the product — in the release candidate and fixing or delegating fixes for any regressions detected during testing. Components usually map one-to-one with our product teams — for example, the login team is responsible for running testing related to the app’s login component. Each component owner has specific tests that they must run before approving the component. And every single component must be approved before the release can be submitted for review.

Of course, making sure all component owners have signed off before submission and figuring out who is still testing can get complicated. We use a mobile release management platform called Runway to make collaboration easier throughout the week. It captures the current status of component testing and can also automatically remind component owners through Slack to complete their tests.

Subscribe for weekly updates

Handling regressions and hotfixes

Although we strive for seamless releases — test, approve, and release — that’s obviously not possible 100% of the time. Sometimes we catch regressions during testing that need to be fixed before we can release the app. Because it takes time to figure out what’s causing a bug and then fixing it requires modifying code — which introduces risk late in the cycle — it’s important to carry out release candidate testing as early in the week as possible. That way we have enough time to find the correct solution to the regression and thoroughly test the fix to make sure our changes don’t break something else. Given the scale and nature of what we build at DoorDash, even small issues can have a big impact, so we approach regressions and hotfixes with rigor and care.

If a component owner identifies a regression during release candidate testing, they work with their team to triage the problem and then develop a fix on our main branch via a regular pull request. After the fix is merged, it may — or may not — be eligible for cherry-picking over onto the release branch. Because late-arriving changes are so risky, we can’t simply allow everything into the release. We have strict requirements that must be met before any change can be added to a release. For example, we allow fixes for regressions or new bugs that have a measurable effect on the user experience, but we don’t allow integration of code to fix minor bugs that don’t affect the user — and we certainly don’t use this process to squeeze in new features that might have missed the release cut. We have developed a flow to facilitate escalating fixes for possible inclusion in a release; teams submit fix requests, including explanations and evidence, that are then reviewed by the release manager.

Post-release, the process for creating a hotfix is similar to that used for requesting a cherry-pick in-cycle, except the criteria for allowing a post-release hotfix are much more strict. Spinning up a hotfix requires much more work and could impact the normal release hot on its heels. If we find a bug late within a release cycle — say, between when the app was submitted for review and its release — the decision on whether it gets a fix depends on the same strict criteria used for post-release hotfixes. Although the update is not yet public, it may be waiting for review or even approved already; to implement a fix, we would have to reject the build and resubmit the app. Because this could delay the release, we evaluate whether a fix is merited and how to approach it on a case-by-case basis. We could, for example, either developer-reject the build, apply a fix, then resubmit, or we could let the release happen on schedule and immediately spin up a hotfix afterward. Alternatively, we may determine the bug isn’t impactful enough to warrant a hotfix, putting it in for a fix during the next release cycle.

Monitoring rollouts post-release

Post-release, we rely on component owners and their teams to keep an eye out for any issues that may arise in their areas of responsibility. Each team has a unique set of key metrics they own and monitor, which makes them best-equipped to understand what may be going wrong with their components in the new release.

But release managers aren’t completely off the hook in this. They must watch higher-level measures of health, like the new release’s crash rate and any trending issues. We use Sentry to keep track of the apps’ crashes. Because we can integrate it with Runway, we can create a single source of truth for observing app health closely. If a release manager sees something unusual, they can delegate component owners to take a deeper look and make fixes as needed. But if no problems arise, we can automatically go to a full rollout.

Conclusion

As described here, releasing mobile apps at scale takes quite a bit of work and coordination. Keeping releases moving forward on schedule requires effort that involves stakeholders across many teams to test and safeguard quality; centralizing control with release managers ensures the process runs consistently and efficiently. The setup described here allows us to maintain a weekly release cadence across multiple apps while keeping quality high and team members happy.

Related Positions

Senior Software Engineer, iOS Infrastructure San Francisco, CA; Sunnyvale, CA; Los Angeles, CA; Seattle, WA; New York, NY See All Jobs

2020 Hindsight: Building Reliability and Innovating at DoorDash

DoorDash recaps a number of its engineering highlights from 2020, including its microservices architecture, data platform, and new frontend development.

WayneCunningham 8 Minute Read

Mobile

How We Reduced Our iOS App Launch Time by 60%

Learn how DoorDash went about optimizing our customers' experience and making continuous improvements in app launch times

Filip Busic 7 Minute Read

Mobile

DoorDash at #WWDC2016

Today at the World Wide Developer’s Conference, Apple revealed some of the newest features of their upcoming iOS 10 platform. One of their most exciting features is for app developers to be able to integrate with iMessage. We are currently working on supporting this new functionality in the DoorDash iOS app, and during the keynote today ...

Stanley Tang 1 Minute Read

Mobile

Avoiding Conditional Navigation Pitfalls When Implementing the Android Navigation Library

Learn how we we were able to utilize the Android Navigation library without sacrificing user experience

Maria Sharkina 9 Minute Read

Mobile

Tips and Tricks for Migrating from Swift 2 to Swift 3

At DoorDash we recently migrated the codebase of our iOS Consumer and Dasher apps to Swift 3 from Swift 2. We decided to migrate the codebase after XCode 8.3 was released in March, which ended support for Swift 2.3. The newest versions of many third party libraries used by our apps are written in Swift ...

James Chiang 9 Minute Read

Mobile Web

Building Chat Into the DoorDash App to Improve Deliveries

Learn how we overcame the technical challenges of implementing chat quickly and efficiently for all our Dasher and consumer applications.

Marina Mukhina 9 Minute Read

Mobile

How DoorDash uses XcodeGen to eliminate project merge conflicts

At DoorDash, we work to implement efficient processes that can mitigate common conflicts within a large iOS development team. Part of those efforts involve using XcodeGen, a command line interface (CLI), to reduce merging conflicts within our various iOS teams. Here we will discuss its implementation to manage the intricate business scenarios and demanding requirements ...