Staff ML Engineer - ML Infrastructure
at Samsara
Want this job?
Let DoneWithWork tailor your resume to this exact posting, write the cover letter, and submit the application for you.
Apply with DoneWithWork — $19.99/moJob description
Who we are Samsara (NYSE: IOT) is the pioneer of the Connected Operations™ Cloud, which is a platform that enables organizations that depend on physical operations to harness Internet of Things (IoT) data to develop actionable insights and improve their operations. At Samsara, we are helping improve the safety, efficiency and sustainability of the physical operations that power our global economy. Representing more than 40% of global GDP, these industries are the infrastructure of our planet, including agriculture, construction, field services, transportation, and manufacturing — and we are excited to help digitally transform their operations at scale. Working at Samsara means you’ll help define the future of physical operations and be on a team that’s shaping an exciting array of product solutions, including Video-Based Safety, Vehicle Telematics, Apps and Driver Workflows, and Equipment Monitoring. As part of a recently public company, you’ll have the autonomy and support to make an impact as we build for the long term.About the role: Samsara is the industry leader in AI for physical operations. We’re hiring a Staff / Senior Staff Machine Learning Infrastructure Engineer to lead the design and evolution of our end-to-end ML platform powering Safety AI and adjacent product areas. This role combines deep platform ownership with direct product impact—enabling teams to build, deploy, and scale ML systems that improve real-world safety outcomes. This is a remote position open to candidates based in the United States. You should apply if: You want to impact the industries that run our world: The software, firmware, and hardware you build will result in real-world impact—helping to keep the lights on, get food into grocery stores, and most importantly, ensure workers return home safely. You want to build for scale: With over 2.3 million IoT devices deployed to our global customers, you will work on a range of new and mature technologies driving scalable innovation for customers across industries driving the world's physical operations. You are a life-long learner: We have ambitious goals. Every Samsarian has a growth mindset as we work with a wide range of technologies, challenges, and customers that push us to learn on the go. You believe customers are more than a number: Samsara engineers enjoy a rare closeness to the end user and you will have the opportunity to participate in customer interviews, collaborate with customer success and product managers, and use metrics to ensure our work is translating into better customer outcomes. You are a team player: Working on our Samsara Engineering teams requires a mix of independent effort and collaboration. Motivated by our mission, we’re all racing toward our connected operations vision, and we intend to win—together. In this role, you will: ML Platform & Infrastructure Design, build, and operate Samsara’s end-to-end ML platform (training, experimentation, batch/online inference, edge) used by multiple Safety AI product teams. Evolve shared training and experimentation infrastructure (orchestration, clusters, environments) and standardize tracking, evaluation, and regression testing for fast, safe iteration. Experimentation & Measurement Partner with product and applied ML teams to ship ML-powered features (CV models, EcoDriving insights, LLM-based reporting) that improve safety, reliability, and cost efficiency. Lead throughput and cost modeling for new ML features—from exploration to production-scale capacity planning—to inform roadmap and go/no-go decisions. Drive experiment design and evaluation, defining success metrics, structuring A/B or offline tests, and turning results into product and technical decisions. Inference & Edge Deployment Design and operate scalable online and batch inference systems (Ray, Spark), including deployment patterns, observability, SLOs, and unified training-to-production workflows. Partner with firmware and edge teams to package, validate, and deploy models to Samsara devices, and build feedback loops from edge to cloud for continuous improvement. Reliability, Security & Operations Own reliability, observability, and security for ML systems across cloud and edge, including on-call practices, incident response, and infrastructure hardening. Own or co-own end-to-end technical delivery for high-priority or high-risk initiatives, from modeling and system design through production rollout. Leadership & Culture Provide Staff+/Senior-Staff technical leadership on ML infrastructure architecture and strategy, influencing cross-team decisions and mentoring engineers and applied scientists. Drive strong developer experience through documentation, office hours, and best practices, while contributing to and representing Samsara in open source communities (Ray, Spark, RayDP). Champion and role model Samsara’s cultural principles: Focus on Customer Success, Build for the Long Term, Adopt a Growth Mindset, Be Inclusive, Win as a Team. Minimum requirements for the role: 10+ years of overall experience in machine learning engineering or related fields, with a strong track record of building and operating large-scale ML systems. Strong experience with distributed computing frameworks such as Ray and/or Spark. Hands-on experience with cloud infrastructure (AWS), containers/Kubernetes, and production observability tooling. Proven experience building or supporting ML platforms (training, experimentation, or inference) used by multiple teams. Solid understanding of ML fundamentals including evaluation, experiment design, and model iteration in production environments. An ideal candidate also has: Experience shipping ML-powered features end-to-end, from design through production and iteration, with measurable impact on product or business metrics. Background in computer vision and/or LLM-based systems in production environments. Experience with edge or on-device ML and collaboration with firmware or embedded teams. Famil
Want this job?
Let DoneWithWork tailor your resume to this exact posting, write the cover letter, and submit the application for you.
Apply with DoneWithWork — $19.99/mo