Site Reliability Engineer (SRE)

Nominal
About Nominal
Nominal is building the software infrastructure powering the world’s most advanced hardware systems — from spacecraft and autonomous vehicles to next-generation industrial machines. Our platform ingests high-rate telemetry, validates complex autonomy software in real time, and enables engineers to iterate faster without sacrificing safety or precision. We’re a small, fast-moving team of engineers and operators who own problems end-to-end, work across disciplines, and thrive on challenges at the intersection of hardware and software.
As a dual-use platform, we serve top-tier commercial and defense customers, including the U.S. Navy, United States Air Force, Shield AI, and Anduril. We’re backed by Sequoia, General Catalyst, Founders Fund, Lux Capital, and Lightspeed Ventures. Our team draws experience from SpaceX, Palantir, Anduril, Applied Intuition, and other leading companies — united by a common mission: enabling hardware engineers to push the boundaries of advanced technology with speed, safety, and precision.
We’re looking for a Site Reliability Engineer to take on a high-leverage role focused on strengthening the foundations of our distributed systems and improving how the entire team builds, ships, and maintains software. This role is ideal for someone who thrives in complex environments, has deep experience with incident response and production systems, and is driven to create safer, faster systems through smart infrastructure and process design.

🚀 What You’ll Do

  • Drive reliability and observability improvements across large-scale distributed systems.
  • Serve as a force multiplier across all engineering teams by reducing downtime, improving tooling, and freeing up senior engineers from firefighting.
  • Own and evolve our incident review process, leading postmortems and embedding learnings into tools, practices, and culture across the company.
  • Collaborate with teams to improve release hygiene, including: Automating release gating (e.g., ensuring code bakes in staging for appropriate windows), preventing code from stagnating in staging environments, and implementing pre-prod automated test pipelines to catch issues early.
  • Build and maintain Nominal’s gRPC middleware to ensure safe, observable, and performant service communication.
  • Improve alerting, debugging, and monitoring to ensure production health and rapid root cause analysis.

🔍 Who You Are

  • You have 7+ years of experience in software engineering with a strong focus on production systems and distributed architectures.
  • You thrive in high-leverage roles that improve how everyone else builds, ships, and fixes software.
  • You’ve led or played a significant role in incident response, building systems, and culture around continuous improvement.
  • You’re excited by complexity, not afraid of it, and you’re deeply motivated to make systems safer and teams faster.

⚡Skills that supercharge us

  • Experience working on distributed systems at scale.
  • Hands-on experience with Kafka/Redpanda, PostgreSQL or other SQL databases, MongoDB/NoSQL databases, Clickhouse or other OLAP databases.
  • Deep understanding of release automation, CI/CD, and code lifecycle management.
  • Familiarity with gRPC and experience building shared infrastructure components like middleware.
  • A systems mindset—you understand the ripple effects of a single bug and know how to design to prevent them

✨ Benefits & Perks

  • 🏥 100% coverage of medical, dental, and vision insurance
  • 🏖️ Unlimited PTO and sick leave
  • 🍽️ Free lunch, snacks, and coffee
  • 🚀 Professional Development Stipend
  • ✈️ Annual company retreats

This job description is written to capture a range of experience levels from 2 years to 15+ years, which is why you’ll see a wide band listed. Your actual base salary will be determined on a case-by-case basis and may vary based on a range of considerations, including job-related knowledge and skills, education, prior experience, and other business needs. The listed salary range represents an estimate for base compensation only. Base salary is just one part of the total rewards package. Eligible employees may also receive highly competitive equity grants in the form of stock options, allowing you to share in the company’s long-term success.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin.
ITAR Requirements
To conform to U.S. Government export regulations, applicant must be a (i) U.S. citizen or national, (ii) U.S. lawful, permanent resident (aka green card holder), (iii) Refugee under 8 U.S.C. § 1157, or (iv) Asylee under 8 U.S.C. § 1158, or be eligible to obtain the required authorizations from the U.S. Department of State. Learn more about the ITAR here.

To apply, please visit the following URL:https://jobs.lever.co/nominal/da37dc7d-ba45-4cab-8aa2-2f7e01db5176/apply?lever-source=Job%20postings%20feed→

Site Reliability Engineer (SRE)

Nominal
About Nominal
Nominal is building the software infrastructure powering the world’s most advanced hardware systems — from spacecraft and autonomous vehicles to next-generation industrial machines. Our platform ingests high-rate telemetry, validates complex autonomy software in real time, and enables engineers to iterate faster without sacrificing safety or precision. We’re a small, fast-moving team of engineers and operators who own problems end-to-end, work across disciplines, and thrive on challenges at the intersection of hardware and software.
As a dual-use platform, we serve top-tier commercial and defense customers, including the U.S. Navy, United States Air Force, Shield AI, and Anduril. We’re backed by Sequoia, General Catalyst, Founders Fund, Lux Capital, and Lightspeed Ventures. Our team draws experience from SpaceX, Palantir, Anduril, Applied Intuition, and other leading companies — united by a common mission: enabling hardware engineers to push the boundaries of advanced technology with speed, safety, and precision.
We’re looking for a Site Reliability Engineer to take on a high-leverage role focused on strengthening the foundations of our distributed systems and improving how the entire team builds, ships, and maintains software. This role is ideal for someone who thrives in complex environments, has deep experience with incident response and production systems, and is driven to create safer, faster systems through smart infrastructure and process design.

🚀 What You’ll Do

  • Drive reliability and observability improvements across large-scale distributed systems.
  • Serve as a force multiplier across all engineering teams by reducing downtime, improving tooling, and freeing up senior engineers from firefighting.
  • Own and evolve our incident review process, leading postmortems and embedding learnings into tools, practices, and culture across the company.
  • Collaborate with teams to improve release hygiene, including: Automating release gating (e.g., ensuring code bakes in staging for appropriate windows), preventing code from stagnating in staging environments, and implementing pre-prod automated test pipelines to catch issues early.
  • Build and maintain Nominal’s gRPC middleware to ensure safe, observable, and performant service communication.
  • Improve alerting, debugging, and monitoring to ensure production health and rapid root cause analysis.

🔍 Who You Are

  • You have 7+ years of experience in software engineering with a strong focus on production systems and distributed architectures.
  • You thrive in high-leverage roles that improve how everyone else builds, ships, and fixes software.
  • You’ve led or played a significant role in incident response, building systems, and culture around continuous improvement.
  • You’re excited by complexity, not afraid of it, and you’re deeply motivated to make systems safer and teams faster.

⚡Skills that supercharge us

  • Experience working on distributed systems at scale.
  • Hands-on experience with Kafka/Redpanda, PostgreSQL or other SQL databases, MongoDB/NoSQL databases, Clickhouse or other OLAP databases.
  • Deep understanding of release automation, CI/CD, and code lifecycle management.
  • Familiarity with gRPC and experience building shared infrastructure components like middleware.
  • A systems mindset—you understand the ripple effects of a single bug and know how to design to prevent them

✨ Benefits & Perks

  • 🏥 100% coverage of medical, dental, and vision insurance
  • 🏖️ Unlimited PTO and sick leave
  • 🍽️ Free lunch, snacks, and coffee
  • 🚀 Professional Development Stipend
  • ✈️ Annual company retreats

This job description is written to capture a range of experience levels from 2 years to 15+ years, which is why you’ll see a wide band listed. Your actual base salary will be determined on a case-by-case basis and may vary based on a range of considerations, including job-related knowledge and skills, education, prior experience, and other business needs. The listed salary range represents an estimate for base compensation only. Base salary is just one part of the total rewards package. Eligible employees may also receive highly competitive equity grants in the form of stock options, allowing you to share in the company’s long-term success.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin.
ITAR Requirements
To conform to U.S. Government export regulations, applicant must be a (i) U.S. citizen or national, (ii) U.S. lawful, permanent resident (aka green card holder), (iii) Refugee under 8 U.S.C. § 1157, or (iv) Asylee under 8 U.S.C. § 1158, or be eligible to obtain the required authorizations from the U.S. Department of State. Learn more about the ITAR here.

To apply, please visit the following URL:https://jobs.lever.co/nominal/da37dc7d-ba45-4cab-8aa2-2f7e01db5176/apply?lever-source=Job%20postings%20feed→

Site Reliability Engineer (SRE)

Nominal
About Nominal
Nominal is building the software infrastructure powering the world’s most advanced hardware systems — from spacecraft and autonomous vehicles to next-generation industrial machines. Our platform ingests high-rate telemetry, validates complex autonomy software in real time, and enables engineers to iterate faster without sacrificing safety or precision. We’re a small, fast-moving team of engineers and operators who own problems end-to-end, work across disciplines, and thrive on challenges at the intersection of hardware and software.
As a dual-use platform, we serve top-tier commercial and defense customers, including the U.S. Navy, United States Air Force, Shield AI, and Anduril. We’re backed by Sequoia, General Catalyst, Founders Fund, Lux Capital, and Lightspeed Ventures. Our team draws experience from SpaceX, Palantir, Anduril, Applied Intuition, and other leading companies — united by a common mission: enabling hardware engineers to push the boundaries of advanced technology with speed, safety, and precision.
We’re looking for a Site Reliability Engineer to take on a high-leverage role focused on strengthening the foundations of our distributed systems and improving how the entire team builds, ships, and maintains software. This role is ideal for someone who thrives in complex environments, has deep experience with incident response and production systems, and is driven to create safer, faster systems through smart infrastructure and process design.

🚀 What You’ll Do

  • Drive reliability and observability improvements across large-scale distributed systems.
  • Serve as a force multiplier across all engineering teams by reducing downtime, improving tooling, and freeing up senior engineers from firefighting.
  • Own and evolve our incident review process, leading postmortems and embedding learnings into tools, practices, and culture across the company.
  • Collaborate with teams to improve release hygiene, including: Automating release gating (e.g., ensuring code bakes in staging for appropriate windows), preventing code from stagnating in staging environments, and implementing pre-prod automated test pipelines to catch issues early.
  • Build and maintain Nominal’s gRPC middleware to ensure safe, observable, and performant service communication.
  • Improve alerting, debugging, and monitoring to ensure production health and rapid root cause analysis.

🔍 Who You Are

  • You have 7+ years of experience in software engineering with a strong focus on production systems and distributed architectures.
  • You thrive in high-leverage roles that improve how everyone else builds, ships, and fixes software.
  • You’ve led or played a significant role in incident response, building systems, and culture around continuous improvement.
  • You’re excited by complexity, not afraid of it, and you’re deeply motivated to make systems safer and teams faster.

⚡Skills that supercharge us

  • Experience working on distributed systems at scale.
  • Hands-on experience with Kafka/Redpanda, PostgreSQL or other SQL databases, MongoDB/NoSQL databases, Clickhouse or other OLAP databases.
  • Deep understanding of release automation, CI/CD, and code lifecycle management.
  • Familiarity with gRPC and experience building shared infrastructure components like middleware.
  • A systems mindset—you understand the ripple effects of a single bug and know how to design to prevent them

✨ Benefits & Perks

  • 🏥 100% coverage of medical, dental, and vision insurance
  • 🏖️ Unlimited PTO and sick leave
  • 🍽️ Free lunch, snacks, and coffee
  • 🚀 Professional Development Stipend
  • ✈️ Annual company retreats

This job description is written to capture a range of experience levels from 2 years to 15+ years, which is why you’ll see a wide band listed. Your actual base salary will be determined on a case-by-case basis and may vary based on a range of considerations, including job-related knowledge and skills, education, prior experience, and other business needs. The listed salary range represents an estimate for base compensation only. Base salary is just one part of the total rewards package. Eligible employees may also receive highly competitive equity grants in the form of stock options, allowing you to share in the company’s long-term success.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin.
ITAR Requirements
To conform to U.S. Government export regulations, applicant must be a (i) U.S. citizen or national, (ii) U.S. lawful, permanent resident (aka green card holder), (iii) Refugee under 8 U.S.C. § 1157, or (iv) Asylee under 8 U.S.C. § 1158, or be eligible to obtain the required authorizations from the U.S. Department of State. Learn more about the ITAR here.

To apply, please visit the following URL:https://jobs.lever.co/nominal/da37dc7d-ba45-4cab-8aa2-2f7e01db5176/apply?lever-source=Job%20postings%20feed→