Job Details

View jobs in our app

Learn more about the app. Workinapps.com

Senior Software Engineer - Reliability (Remote)

2025-11-30 Jobgether all cities,AK

Description:

Senior Software Engineer - Reliability (Remote)

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Software Engineer - Reliability (Remote) in California (USA).

We are seeking a Senior Software Engineer specializing in Reliability to help design, implement, and operate systems that ensure cloud?based production environments remain secure, compliant, and highly available. In this role, you will be a foundational member of a new Site Reliability Engineering (SRE) team, building processes and infrastructure to support mission?critical workloads in regulated environments. You will collaborate with engineering, product, and operational teams to define service?level objectives, develop monitoring and automation, and improve overall system reliability. The ideal candidate is experienced in cloud infrastructure, automation, and observability, and enjoys solving complex distributed system challenges. This role offers the opportunity to shape the SRE culture and practices from the ground up, while contributing to high?impact projects that support regulated and commercial operations.

Accountabilities

Design and implement observability practices including metrics, traces, dashboards, logs, and alerting for production systems
Partner with engineering, product, and lab teams to define SLIs/SLOs, error budgets, and incident response procedures
Develop and maintain operational playbooks and runbooks for reliability and compliance
Participate in on?call rotations, championing automation and self?healing for production systems
Contribute to deployment processes and infrastructure automation using Infrastructure as Code (IaC)
Collaborate on incident reviews, postmortems, and disaster recovery exercises to improve system reliability
Mentor peers, promote best practices, and help establish the SRE culture and strategy

Requirements

Bachelor's degree in Computer Science, Engineering, or equivalent experience
5+ years of experience in software engineering, SRE, or DevOps roles (Python or Go preferred)
Hands?on experience deploying and operating production workloads in cloud environments (AWS, GCP, or Azure)
Expertise in Infrastructure as Code (Terraform, Pulumi, Bicep/ARM)
Experience with incident management platforms (e.g., Incident.io, ServiceNow, Opsgenie, PagerDuty)
Strong knowledge of Kubernetes (AKS, GKE, EKS) and cloud networking
Proficiency with observability platforms such as DataDog, Prometheus/Grafana, or OpenTelemetry
Excellent troubleshooting, root?cause analysis, and automation skills
Ability to work autonomously and collaborate effectively with cross?functional teams
Experience in regulated environments (healthcare, biotech) and familiarity with compliance?driven change management is a plus

Benefits

Competitive salary: $131,325-$201,000 USD, with potential for pre?IPO equity and cash bonuses
Comprehensive medical, dental, and vision coverage
Paid time off and holidays
Remote work flexibility
Opportunities for professional growth, mentorship, and leadership in a foundational SRE team
Participation in shaping processes for high?reliability systems in regulated environments

Seniority Level

Mid?Senior level

Employment Type

Full?time

Job Function

Information Technology

Industries

Non?profit Organizations and Primary and Secondary Education

#J-18808-Ljbffr

Job Details

View jobs in our app

Senior Software Engineer - Reliability (Remote)

Senior Software Engineer - Reliability (Remote)

Accountabilities

Requirements

Benefits

Seniority Level

Employment Type

Job Function

Industries

Apply for this Job

Registration Required

Login to Apply

You are leaving our site

Registration Required

Email this job to a friend

Job: Senior Software Engineer - Reliability (Remote)

Job Alert Sign Up

Add To Job Alert

Job Alert Updated

Email Customer Care