Remove tag sre
article thumbnail

Scaling Salt for Remote Execution to support LinkedIn Infra growth

LinkedIn Engineering

It discovers master hosts using facts available on hosts (like fabric, tags, etc) and li-salt-master DNS record is exposed by LinkedIn service discovery. li-minion also defines its own resource limits like num of file operations and active memory to avoid disturbing production services.

MySQL 103
article thumbnail

Metal as a Service (MaaS): DIY server-management at scale

LinkedIn Engineering

With a basic understanding of how server-upgrade workflow needed to evolve, the PSSEBuild team interacted with various SRE teams to gather requirements that best fit their usability and necessities. With the completed design, we wrote an API that SRE teams could directly interact with.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Improved Alerting with Atlas Streaming Eval

Netflix Tech

A few years ago, we were paged by our SRE team due to our Metrics Alerting System falling behind — critical application health alerts reached engineers 45 minutes late! While actionability is subjective and may vary by use-case, reliability is non-negotiable. In other words, false positives are bad but false negatives are the absolute worst!

Database 115
article thumbnail

Costwiz: Saving cost for LinkedIn enterprise on Azure

LinkedIn Engineering

It’s now much easier to manage large infra requirements that have traditionally demanded an amalgamation of teams like DBA, Infra-SRE, Onprem-SMEs, network managers, and access control managers working together. We started with parsing provisioner details in resources and then processed the tags in resources and resource groups.

article thumbnail

Operation-Based SLOs

Zalando Engineering

Anyone who has been following the topic of Site Reliability Engineering (SRE) has likely heard of Service Level Objectives (SLOs) , and Service Level Indicators (SLIs). SLIs and SLOs are at the core of the SRE practices. When we introduced SRE in Zalando back in 2016 we also introduced SLOs.

article thumbnail

Keeping Pace with New iOS Releases

Pandora Engineering

A year ago, our macOS infrastructure consisted of aging Mac Minis and Mac Pros, and it was cumbersome for our Tech Ops, SRE and NetOps teams to host and maintain this infrastructure. Infrastructure on Demand Environment rollback became as simple as provisioning an on-demand environment from a different image or previous image tag/version.

article thumbnail

Top 14 Azure Tools You Must Know in 2023

Knowledge Hut

This blog walks you through the top Azure Monitoring and Development that every SRE and DevOps engineer must know. Features Provisioned with tags to filter incoming and outgoing traffic. Also, Microsoft Azure Fundamentals certification will help you validate your expertise on how Azure supports security, privacy, compliance, and trust.