Get matched →

Staff Site Reliability Engineer

at Zscaler

ZscalerRemote - USA; San Jose, California, USAPosted 2026-06-01

Want this job?

Let DoneWithWork tailor your resume to this exact posting, write the cover letter, and submit the application for you.

Apply with DoneWithWork — $19.99/mo

View original posting →

About Zscaler Zscaler accelerates digital transformation to ensure our customers can be more agile, efficient, resilient, and secure. As an AI-forward enterprise, we are constantly pushing the envelope, leveraging the world’s largest security data lake to power our cloud-native Zero Trust Exchange platform. This innovation protects our customers from cyberattacks and data loss by securely connecting users, devices, and applications in any location. Here, impact in your role matters more than title and trust is built on results. We say, impact over activity. We seek innovators who actively use AI to amplify their impact and who thrive in an environment where we leverage intelligent systems to stay ahead of evolving threats. We believe in transparency and value constructive, honest debate—we’re focused on getting to the best ideas, faster. We build high-performing teams that can make an impact quickly and with high quality. To do this, we are building a culture of execution centered on customer obsession, collaboration, ownership, and accountability. We value high-impact, high-accountability with a sense of urgency where you’re enabled to do your best work and embrace your potential. If you’re driven by purpose, thrive on solving complex challenges, and want to be part of the team that’s helping to secure the AI age, we invite you to bring your talents to Zscaler and help shape the future of cybersecurity.Role We are looking for a Staff Site Reliability Engineer to join our team. This role will report to the Senior Manager, Site Reliability Engineering and offers the flexibility of hybrid (3 days a week) out of San Jose, CA, or can be performed fully remote.  As a key member of the Zero Trust Exchange team, you will be responsible for all aspects of the Zscaler production data center services, including servers, operating systems, storage, and supporting systems. You will be an instrumental part of the Site Reliability Engineering team, ensuring the availability, latency, performance, efficiency, and scalability of a cloud that processes tens of billions of transactions daily. What you’ll do (Role Expectations) Own the reliability of a large-scale cloud service (Linux/BSD, bare metal, Kubernetes, custom load balancing, SD-WAN) by partnering with Engineering and Network teams to define requirements early, conduct operability reviews, and contribute code/design docs for platform resilience Develop and operate end-to-end observability (metrics/logs/traces, dashboards, alerting) and incident tooling to manage SLOs/error budgets, reduce noise, and improve system detection and diagnosis Participate in an on-call rotation to lead full-cycle incident response; perform deep cross-stack troubleshooting (OS, networking, distributed systems, packet captures, core dumps) to drive permanent software fixes and codify learnings into runbooks and tests Build and maintain everything-as-code for fleet and service lifecycle, driving provisioning, configuration, release automation, canary deployments, and complex rollout/rollback workflows Continuously improve platform hygiene through consistent OS/app upgrades, dependency/vulnerability patching, capacity and performance tuning, and strict CI/CD validation prior to production rollouts Who You Are (Success Profile) You act like an owner. Your passion for the mission fuels your bias for action. You operate with integrity because you genuinely care about the outcome. You adapt to what’s needed, navigating seamlessly between high-level strategy and hands-on execution. You are a problem-solver. You seek out challenges because you are energized by finding solutions, knowing that solving the hard problems delivers the biggest impact. You are a high-trust collaborator. You are ambitious for the team, not just yourself. You embrace our challenge culture by giving and receiving ongoing feedback—knowing that candor delivered with clarity and respect is the truest form of teamwork and the fastest way to earn trust. You operate with urgency. You understand that in a high-growth environment, speed and quality are not mutually exclusive. You have a relentless focus on execution and a bias for action, delivering high-impact results quickly to win for the customer and the team. You think at scale. You connect your day-to-day work to the larger company mission and think globally. You build solutions, processes, and teams that are not just effective today but are built to last and support a high-growth, global organization. What We’re Looking for (Minimum Qualifications) US Citizenship is required (due to the nature of assigned customers) and 5+ years industry experience in software engineering, infrastructure software, and/or platform engineering Proficiency in at least one programming language (such as Python, Bash, or Go) with demonstrated ability to write production-quality code (testing, code reviews, CI, maintainable design,scripting for diagnostics Strong Linux/Unix systems fundamentals (process/memory, filesystems, networking stack basics, debugging/perf troubleshooting) and solid understanding of networking protocols and components (e.g., HTTP, DNS, TCP/IP, ICMP, OSI model, subnetting, and load balancing/traffic concepts) Proven experience operating production services (including incident response, troubleshooting, reducing toil) and ability to participate in on-call rotations and support occasional after-hours or weekend deployments Managing BSD in production, with a focus on driving systemic fixes through platform engineering What Will Make You Stand Out (Preferred Qualifications) Proven expertise in operating Kubernetes at scale Deep experience with the Prometheus/OpenTelemetry ecosystems, including instrumenting golden signals, defining SLOs, and performing alert tuning to ensure high-availability environments #LI-KM9 #LI-RemoteZscaler’s salary ranges are benchmarked and are determined by role and level. The range displayed on each job posting reflects the m

Want this job?

Let DoneWithWork tailor your resume to this exact posting, write the cover letter, and submit the application for you.

Apply with DoneWithWork — $19.99/mo

View original posting →