Using rideshare data to evaluate racial bias in the issuance of speeding citations

Pradhi Aggarwal
Lyft Engineering
Published in
9 min readDec 16, 2022

--

The disproportionate impact of policing on communities of color¹ is a central social and policy concern in the United States, and a topic of intense study in academia. Lyft is uniquely positioned to contribute to this discourse and the academic research and literature on this topic using data from the large number of trips on our rideshare platform. The research team analyzed driving patterns and contextual features around traffic stops and found statistically and economically significant differences by race in speeding citations and fines that cannot be explained by differences in driving behavior. This article summarizes a more detailed academic paper that can be downloaded for free here.

Prior research examining the effect of driver race on outcomes like vehicle searches, leniency in traffic citations, and use of force increasingly reports evidence that drivers from communities of color face harsher punishments compared to white drivers. Researchers have found that drivers of color are more likely to be cited for speeding², less likely to receive a discount on speeding tickets³, and more likely to be convicted of a misdemeanor for speeding⁴ than white drivers.

However, there are two fundamental challenges with this research — both of which can be mitigated using ridesharing data. First, there may be a selection bias in the data. Prior studies use police-reported data on drivers who experience a certain outcome, such as being issued a speeding ticket, and examine inequities by racial group in post-stop outcomes such as leniency in fines or the use of force. However, if there is non-random selection into the set of people who are stopped, such methods are likely to underestimate bias. Second, researchers are usually unable to observe the alleged offender’s behavior that the police are able to observe. In the case of speeding citations, researchers usually only have access to police-reported speeds, which may not always be an accurate representation of the speed the driver was actually traveling. The lack of information on true driver speeds makes it hard to parse out the factors involved in the police officer’s decision. Using driver location data addresses both of these issues. The researchers were able to observe a much broader set of driving behavior, including when people are speeding but not stopped by the police, with which one can:

  1. Analyze the unconditional rate at which people are cited
  2. Analyze citations with a measure of the driver’s behavior right before the police intervened (i.e. how much they were actually speeding)

We outline our methodology and findings below.

Data

Our research team’s analysis used several sources of data. First, we obtained records of traffic violations in Florida via a Freedom of Information Act request of the Florida Court Clerks and Comptrollers. These records included the approximate time of each speeding citation, the associated fine, as well as the driver’s license number. Second, we use Lyft data on contextual features such as vehicle characteristics and time and GPS pings to infer location and speed. Third, we obtained information on the speed limits drivers encounter along with data on road features from the Florida Department of Transportation’s (FDOT) Open Data Hub. In combination with our GPS data, this allowed us to measure when, and by how much, a driver was speeding. Finally, we used public voter records from the Florida State Election Board that contain self-reported driver race. Combining these datasets, the team analyzed traffic stops that occurred in Florida from August 2017 to August 2020 affecting drivers while they were online on Lyft’s platform.

While drivers in Florida experienced tens of thousands of traffic stops over that time period, not that many of them (thankfully) were during a Lyft driving session. Our data consists of hundreds of millions of driving sessions but only 1,423 stops — a relatively small number which doesn’t give our analysis a whole lot of statistical power. Moreover, traditional methods like logistic regression can sharply underestimate the probability of ‘rare events’, or binary dependent variables with dozens to thousands of times fewer ones than zeros (such as wars, political vetoes, epidemiological infections, or in our case, traffic stops). To get around this issue as well as cull our data to a more tractable size, we used a sampling trick. Rather than sampling driver sessions completely at random, we used a simple choice-based sampling technique⁵: sample all the 1s (citations) and a random selection of 0s (instances where a driver was driving for Lyft but not cited). Next, we corrected the estimator using weights that account for each observation’s sampling probability, resulting in over 19 million observations.

Findings

We started by looking at the baseline speeding behavior as a function of race group. We plotted the portion of time drivers spent driving in different buckets of speed relative to the speed limit, adjusted for contextual features like location and time. Both white drivers and drivers of color traveled at similar speeds relative to the speed limit and rarely traveled more than 10 mph over the legal limit. While white drivers were slightly more likely to drive 0–9 mph over the speed limit, we found no statistically significant differences for speeding more than 10 mph over the limit, which triggers substantial fines.

Figure 1: This figure plots the proportion of GPS pings (location coordinates communicated periodically by the Lyft driver app) in different speeding buckets. Speeding buckets are constructed by comparing the driving speed inferred by us to the speed limit reported by the FDOT.

Figure 1: This figure plots the proportion of GPS pings (location coordinates communicated periodically by the Lyft driver app) in different speeding buckets. Speeding buckets are constructed by comparing the driving speed inferred by us to the speed limit reported by the FDOT.

Having validated that overall speeding behavior was similar across races, we turned to our central analysis: are drivers of color cited disproportionately compared to white drivers?

To estimate this effect, we used two empirical strategies. The first is a fixed effects (FE) model similar to what has been commonly used in the bias literature: we estimated fixed effects for driver and vehicle characteristics, along with the speeding buckets used in Figure 1, to hold all else equal when comparing citation rates and fines between white drivers and drivers of color. Second, because we have such a large dataset, we were able to leverage a double machine learning (DML) approach to more flexibly control for potential confounding variables and strengthen the causal interpretation of our findings.

Regardless of the model estimated, drivers from communities of color were significantly more likely to be cited for speeding and as a result, paid more in fines. The top two panels below plot the average number of citations and fines for both groups of drivers after accounting for the amount of speeding and other control variables. Drivers of color had a 24 to 33 percent higher chance of being cited and paid 23 to 34 percent more in fines per time spent driving than white drivers, even after controlling for various driver, vehicle, and contextual features in our data.

Figure 2: The top two panels plot the regression adjusted means of citations and fines per 10,000 hours of driving for each race group of drivers. These estimates are computed over our entire dataset, unconditional on the driver being cited. The bottom two panels plot the differences between white drivers and drivers of color. ** p-value < 0.05

Figure 2: The top two panels plot the regression adjusted means of citations and fines per 10,000 hours of driving for each race group of drivers. These estimates are computed over our entire dataset, unconditional on the driver being cited. The bottom two panels plot the differences between white drivers and drivers of color. ** p-value < 0.05

These results provide the first estimates of racial inequities in traffic-related police punishment that account for the driver’s behavior. Our analysis indicates that differences in policing outcomes across races cannot be explained by differences in driver behavior previously unobserved by researchers. In other words, a driver of color was likely to face harsher punishment than a white driver with the same speeding behavior and same contextual features. Our granular location controls also allow us to rule out the possibility that these effects were solely driven by overpolicing in neighborhoods where the majority of the population belongs to communities of color; we found that on average even within a given location, regardless of that location’s demographics, drivers of color were more likely to be cited and paid higher fines.

Reoffense Rates

A claim sometimes made in the racial inequity literature is that police cite and penalize drivers from communities of color more harshly because of differences in reoffense rates. Past recidivism research has suggested that offenders from communities of color are more likely to recidivate⁶ for a variety of criminal offenses. If officers believe that citations might be more effective in deterring drivers of color from reoffending, they may cite such offenders at a higher rate to prevent speeding in the future. However, recidivism may conflate the rate of reoffense (i.e. actually committing the crime again) with the rate of getting caught again.

We compared the portion of time speeding by race before and after a citation to understand actual behavior changes in response to a citation, rather than just looking at whether drivers were cited again. The graph on the left plots the portion of weekly driving time a driver spent speeding more than 10 mph over the speed limit relative to when the driver was cited. We saw that after week 0 (the week the driver was stopped), both drivers of color and white drivers reduced how much they speed. However, this change in behavior did not differ by race. The panel on the right plots the difference in differences estimator with 95-percent confidence intervals, and we found no effect. In other words, citations did have a deterrent effect on future speeding, but not differentially by race.

Figure 3: The panel on the left plots the regression coefficient for the likelihood of speeding more than 10 mph over the limit as a function of the number of weeks since the traffic stop for each race group . Week 0 on the x-axis denotes the week the driver was cited. The panel on the right plots the differences between the regression coefficients for each race group along with 95-percent confidence intervals.

Figure 3: The panel on the left plots the regression coefficient for the likelihood of speeding more than 10 mph over the limit as a function of the number of weeks since the traffic stop for each race group . Week 0 on the x-axis denotes the week the driver was cited. The panel on the right plots the differences between the regression coefficients for each race group along with 95-percent confidence intervals.

Our finding that reoffense rates are similar across race groups suggests that this is not a valid justification for observed differences in citation rates.

Limitations

While our results are robust to changes in our data construction choices, model form, and a broad set of controls, it is important to call out a few caveats and limitations. First, we are unable to observe instances where an officer may have let a driver go with a warning instead of a formal citation. This additional layer between committing the offense and actually being written up might change our point estimates if we had access to that data. Second, while we are able to account for the main speeding behavior and vehicle type, we can’t observe other types of erratic behavior or vehicle features (such as the car being dinged up or dirty) that might influence the stop decision. Finally, our dataset of drivers differs from the overall Florida population in that drivers on Lyft’s platform are more likely male, more likely to belong to communities of color, and are slightly younger. Drivers are also required to undergo a background check before they start driving for Lyft and can be offboarded for unsafe driving, which may select for safer drivers. For these reasons, our results may not generalize perfectly to the larger population of Florida drivers.

These limitations could influence the magnitude of these effects in other settings, but our findings provide an important directional estimate of racial inequity, and the empirical strategy we develop to obtain these findings can be used more widely.

Takeaways

Traffic stops are one of the most common settings of civilian and police interactions⁷. Does race influence these interactions? There is tremendous interest in answering this question, but prior research has had several limitations. Recent technological advances allow researchers to observe huge samples of high-frequency location data and we demonstrate how this type of data can be used to account for the offender’s behavior. We found that race had a statistically and economically significant impact on the police punishment that civilians received in our dataset.

If you’re passionate about using data science for racial equity and social impact, then take a look at Rideshare Labs Applied Research positions on our careers page.

[1] When we use the phrases “community of color” or “driver of color”, we are referring to (a) person(s) who is/are Black, Hispanic, Asian American, or Pacific Islander. This set was chosen as a function of granularity of available data, and we recognize that it may not be exhaustive of all those who identify as persons of color. We also acknowledge that each community has its own unique history and experience of racism in the United States.

[2] Makowsky and Stratmann, 2009.

[3] Anbarci and Lee, 2014 and Goncalves and Mello, 2021.

[4] Anwar et al., 2021.

[5] Logistic Regression in Rare Events Data (King and Zeng 2001).

[6] Kuziemko, 2013; Lockwood et al., 2015; DOJ 2018; Goncalves and Mello, 2020.

[7] Ba et al., 2021.

--

--