Behind the Scenes with Two New Salary Transparency Websites

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. If you’re not yet a full subscriber, you missed this week’s deep-dive into Figma’s engineering culture. To get full newsletters twice a week, subscribe here.

In The Scoop #42, I went into detail about pay transparency changes in the US. California, Washington State and New York all recently passed legislation that mandates employers to post salary or total compensation ranges for job postings. This regulation impacts tech companies with a sizable presence in these regions, and is especially relevant for Big Tech. Companies have started to comply by listing expected pay upfront.

Starting January, a wealth of compensation data is available on these employers’ websites. This created an opportunity to build job sites which collect this data, make it easy to browse, and allow job seekers to apply to jobs paying at or above a certain level.

Several software engineers rose to the occasion by doing precisely this over the past few months. I reached out to them for some “behind the scenes” insights.

Comprehensive.io crawls job sites and parses the data. Users can browse by company and range:

The landing page at Comprehensive.io
The landing page at Comprehensive.io

The site was cofounded - and is partially built - by Roger Lee, the creator of job cuts tracker website Layoffs.fyi. The site was launched in January 2023, shortly after California's pay transparency law went live. Today, the site tracks 2,000 tech companies and startups. In what is neat, the site tracks what percentage of companies are in compliance with the law: that is, post salary ranges now that they are required to do so.

I asked Roger why he started the site. He said:

“As a result of running Layoffs.fyi, I’m also keenly aware that hundreds of thousands of tech employees have lost their jobs over the past year. Our hope is that making salary ranges more accessible on Comprehensive.io will help level the playing field for job seekers and employees, and help them navigate this fast-evolving talent market.”

How does Comprehensive.io work and what is the tech stack behind it? Roger explains:

“We built software that automatically visits the Careers pages of these companies every day, finds all the job posts, and uses AI to extract the salary ranges from the text of the job descriptions. For AI,  we’ve built a system to efficiently use GPT-4 for this purpose, including auto-crafting prompts and performing pre and post-processing.

We use React on the frontend, Node.js on the backend, and Postgres for database storage.”

Neat - and it’s nice to hear of the team already adopting GPT-4 for a practical use case. Given Roger has been running the site for more than 3 months, I asked him what interesting things he’s come across. He mentioned 3 things:

“1. Although much has been made about outliers like Netflix who post unhelpfully-wide salary ranges, it turns out the vast majority of companies are posting salary ranges in good faith.

For example, the average salary range posted on a job listing is $130,000 - $200,000. This represents a width of +/- 21% from the midpoint: right in line with what HR professionals consider to be best practice for the width of an internal salary range.

2. The percentage of tech companies complying with the California transparency law has jumped from 28% on January 1—when the law went live— to 58% today. In NYC—where the law has been live since last October—, the compliance rate is 70%.

3. The average salary posted for a ‘Machine Learning Engineer’ job title is 19% higher than for a ‘Software Engineer’ job title!”

Finally, I asked Roger if he can share the highest and lowest compensation packages they picked up on. The highest ones are:

And the lowest ones:

Levels.fyi Jobs is similar to Comprehensive.io, except it also tries to fill the gap on total compensation estimates for job postings that list base salary, but not equity or bonus components:

The landing page for Levels.fyi Jobs
The landing page for Levels.fyi Jobs

I reached out to Levels.fyi cofounder Zuhayeer Musa, asking how the site works. He shared:

“I'd preface everything by saying that this is very much a v1 of our jobs product and we plan to iterate and build a lot more as we get feedback.

We use a data provider to get jobs data and don’t scrape ourselves. Most jobs vendors have a ton of ‘junk jobs,’ so we spent a fair bit of time culling the dataset to jobs that are unique.

We put the jobs data into Amazon S3. We have a network of Lamdas that fire any time new data is added. Our system is using purely Serverless to process the data. During processing, we match companies, titles and more, with our dataset. We also enrich the jobs with total compensation estimates and benefits in this step.”

It’s early days but I asked Zuhayeer what was an engineering challenge they’ve encountered. It was around unique searches:

“We were surprised to see just how many job searches are unique! For the landing page, this is less of a problem as we use caching heavily. However, we wanted to build powerful filters for users. We have filters like the ability to filter by particular benefits, total comp, company valuation, and others.

However, all these filters mean that there are more unique requests, and we can’t take advantage of caching as much as we’d like.”

Zuhayeer shared that they’re playing around with OpenAI embeddings and vector databases:

“We’re still experimenting with some better search techniques and have started prototyping semantic search powered by OpenAI embeddings. We are super excited to start playing vector databases - ones that store and index vector embeddings we get from natural language processing models like OpenAI embeddings.”

Finally, Zuhayeer asked me to relay that they’d love to get feedback if you use the site - via the on-site widget or the hello@levels.fyi email.

Thanks both to Roger and Zuhayeer for sharing details, and browse their sites here:

This was one out of the five topics covered in this week’s The Scoop. A lot of what I share in The Scoop is exclusive to this publication, meaning it’s not been covered in any other media outlet before and you’re the first to read about it.

The full The Scoop edition additionally covers:

  1. Big Tech realizes quick layoffs in Europe can’t be done. Meta and Amazon are realizing what companies like Microsoft and Uber learned the hard way: that in Europe you cannot hire fast, then fire fast. I share my first-hand experience of Uber’s 2019 and 2020 layoffs in the Netherlands, and what employee protection might mean for Big Tech and hiring in Europe. Exclusive.
  2. Mercedes-Benz’s compensation philosophy. A deep dive into how software engineers at the German car maker are paid in Germany, and the influence the country’s biggest worker’s union plays in salary bands. Exclusive.
  3. Unexpectedly big raises at Hubspot. I’ve talked with an engineering manager who’s been surprised by how high compensation increases are at the publicly traded tech company. Exclusive.
  4. Total compensation drops at Amazon. This week is when engineers at Amazon learn their compensation rises. Engineers told me how things are going, and I’ve heard disappointment and a drop in total compensation. Exclusive.

Read the full The Scoop here.

Subscribe to my weekly newsletter to get articles like this in your inbox. It's a pretty good read - and the #1 tech newsletter on Substack.