Customer Reliability Engineer
at Cloudflare
Want this job?
Let DoneWithWork tailor your resume to this exact posting, write the cover letter, and submit the application for you.
Apply with DoneWithWork — $19.99/moJob description
About Us At Cloudflare, we are on a mission to help build a better Internet. Today the company runs one of the world’s largest networks that powers millions of websites and other Internet properties for customers ranging from individual bloggers to SMBs to Fortune 500 companies. Cloudflare protects and accelerates any Internet application online without adding hardware, installing software, or changing a line of code. Internet properties powered by Cloudflare all have web traffic routed through its intelligent global network, which gets smarter with every request. As a result, they see significant improvement in performance and a decrease in spam and other attacks. Cloudflare was named to Entrepreneur Magazine’s Top Company Cultures list and ranked among the World’s Most Innovative Companies by Fast Company. At Cloudflare, we’re not looking for people who wait for a polished roadmap; we’re looking for the builders who see the cracks in the Internet that everyone else has simply learned to live with. We value candidates who have the instinct to spot a "normalized" problem and the AI-native curiosity to create a solution using the latest tools. Our culture is built on iteration, leveraging AI to ship faster today to make it better tomorrow, while ensuring that every improvement, no matter how small, is shared across the team to lift everyone up. If you’re the type of person who values curiosity over bureaucracy, and that AI is a partner in solving tough problems to keep the Internet moving forward, you’ll fit right in. Available Locations: Austin, Texas About the Role Cloudflare built its reputation helping build a better Internet, defending millions of sites, giving away SSL and DDoS mitigation when the industry charged premium prices. In an acceleratingly dangerous world, the scope of that mission has changed. We are becoming something more: critical infrastructure. Banks run their payment rails on us. Governments run public services on us. Media companies depend on us during live events. Health systems depend on us to provide care. Reliability for these customers is no longer a feature of our product. It is a mission. Serving that customer base demands a different operating model. Traditional support organizations route tickets. Traditional engineering organizations ship features. Neither alone is enough when the stakes are this high. We are pivoting to something different: a customer-facing engineering organization, directly engaged with our customers at scale. This is work a central dev team cannot do from the inside of the network. The Customer Reliability Engineering function is the spine of that pivot. CRE is SRE applied outward, the same engineering discipline, applied to the reliability of the systems our customers run on Cloudflare. You are the engineer who owns the problems that matter most to the customers who matter most, and you contribute directly to our products and tooling, in partnership with Product Engineering, to hold that standard across the entire customer base. The Role CRE is a rapid response team and a proactive engineering team. You fix things at the edge as they come up, and you help build the product capabilities that identify customer issues before they become a crisis. Both modes are equally core. Rapid response. When a customer issue surfaces that is high-severity, cross-layer and complex, you are the engineer who answers. You reproduce the defect, isolate the root cause across Cloudflare's infrastructure and the customer's stack, drive the fix with Product Engineering, and confirm resolution. You hold on-call for high-severity incidents as part of a global shift rotation. Proactive engineering. When no fire is burning, you work with Product Engineering and our platform teams to build the capabilities that make the next fire cheaper or unnecessary: telemetry pipelines that correlate signals across the customer base, detectors that fire before a human notices, diagnostic tooling that scales across hundreds of customers, automation that reduces toil for Customer Support. Every incident you carry generates engineering output that reduces the cost of recurrence. The work compounds. Cloudflare is building CRE as an AI-native function. You will work with and help build agents and tooling that pre-diagnose incidents, surface relevant logs and configuration, and propose fixes with cited evidence. Engineers who ship AI-assisted diagnostics are the ones defining this discipline. What You Might Work On Rapid response: Own a Sev-1 incident where a large financial services customer sees asymmetric latency from a single POP. Trace it through BGP routing and origin configuration. Produce the fix upstream. Diagnose a recurring WebSocket disconnect that a media customer has been fighting for weeks. Isolate it to a specific interaction between WAF and their origin load balancer. Drive the fix with Product Engineering. Partner with a government customer's SRE team during an active DDoS event. Help them shape their Magic Transit and WAF configuration in real time. Proactive engineering: Build, with Product Engineering, a distributed tracing capability that correlates Cloudflare edge signals with customer origin metrics so a single query tells the story of a failing request end-to-end. Ship a detector for a class of WAF false positives silently degrading several customers. Get it into production before the next renewal cycle. Prototype an AI agent that takes a new customer case, pulls relevant logs and config, and proposes a root cause with linked evidence. Deploy it internally. Measure whether it makes engineers faster. Iterate. Responsibilities Rapid incident response and root cause analysis. Own the most complex, high-severity customer issues end-to-end, from first signal through confirmed resolution. Lead deep-dive debugging across the full stack: edge, network, DNS, transport, APIs, application, customer-side configuration. Reproduce defects, validate fixes with En
Want this job?
Let DoneWithWork tailor your resume to this exact posting, write the cover letter, and submit the application for you.
Apply with DoneWithWork — $19.99/mo