Lead Platform Engineer – MLOps & DevOps
at Mastercard
Want this job?
Let DoneWithWork tailor your resume to this exact posting, write the cover letter, and submit the application for you.
Apply with DoneWithWork — $19.99/moJob description
Our PurposeMastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.Title and SummarySenior/Lead Platform Engineer – MLOps & DevOpsOur Purpose Mastercard powers economies and empowers people across 200+ countries and territories. Together with our customers, we build a sustainable, inclusive economy by enabling secure, simple, smart, and accessible digital payments. Our technology, innovation, partnerships, and networks deliver products and services that help people, businesses, and governments reach their full potential. --------------------------------------------------------------------------Lead DevOps Engineer We are hiring a Lead DevOps/Platform Engineer to help build the next-generation AI/ML infrastructure used by Mastercard Foundry. This role sits at the intersection of DevOps, Platform Engineering, and MLOps — supporting R&D teams who are still shaping their product direction. You will own the core platform components that power AI experimentation and it’s delivery to end users: Azure, AKS, GPU compute, Databricks, Terraform/Terragrunt, GitHub Actions, and GitOps. You will help build and scale AI/ML infrastructure to support our innovation efforts, with a focus on automation, observability, and developer experience. The ideal candidate is hands-on, curious, motivated, and comfortable working in fast-moving R&D environments. This is not an AI modelling role — it is a deeply technical platform role focused on enabling rapid, safe, reproducible ML development. --------------------------------------------------------------------------What You'll Do Drive Platform Infrastructure: Own DevOps and infrastructure for MLOps and agentic AI systems, establishing reusable patterns for CI/CD, scalable inference, orchestration, observability, and cost control. Design secure, scalable, repeatable systems using Infrastructure as Code (IaC) to support R&D workloads. Build secure CI/CD & automation systems: Enable secure tool access, workload isolation, and infrastructure for LLM-backed APIs and MCP servers, while partnering with security and compliance on access control, infrastructure governance and auditability. Ensure Reliability & Observability: Implement monitoring, logging, and alerting. Tune observability for ML-specific workloads to ensure performance, reliability, and operational insight. Provide Technical Leadership: Offer hands-on leadership across DevOps and platform initiatives. Review code, enforce best practices, improve tooling, and promote clean, well-tested infrastructure. Cross-Functional Collaboration: Partner with ML, software, and platform engineers to design deployment strategies, scope work, manage agile deliverables, and meet milestones. -------------------------------------------------------------------------- What You'll Bring Extensive DevOps Experience: 8–12+ years in DevOps, SRE, or platform engineering, including senior/lead roles. Experience designing end-to-end infrastructure systems, solving scale/performance challenges, and operating platforms in production. Cloud & Infrastructure Expertise: Strong skills in cloud platforms (AWS, Azure, or GCP) and AI/ML components such as Databricks, Azure ML, and MLflow. Deep experience with Infrastructure as Code using Terraform and orchestration tools like Terragrunt. Container & Orchestration Mastery: Expertise in Kubernetes and Docker, including how they optimise ML development workflows. Experience with container security, networking, and cluster management at scale. AI/ML Platform Knowledge: Understanding of ML workflow requirements—model registries, feature stores, AI agents, Retrieval-Augmented Generation (RAG) techniques, and frameworks like LangChain/LlamaIndex. Leadership & Mentorship: Ability to translate ambiguous goals into clear plans, guide engineers, and lead technical execution. Problem-Solving Mindset: Approach issues systematically, using analysis and data to select scalable, maintainable solutions. --------------------------------------------------------------------------Required Skills Education & Background: Bachelor's degree in Computer Science, Engineering, or related field. 8–12+ years of proven experience architecting and operating production-grade infrastructure, especially those supporting AI/ML workloads. Infrastructure as Code: Expert in Terraform and IaC orchestration tools like Terragrunt. Strong experience with configuration management and GitOps practices. Programming & Scripting: Advanced Bash and Python skills and strong software engineering fundamentals (version control, CI, code reviews). Familiarity with Go or other systems programming languages is a plus. CI/CD & Automation: Hands-on experience with Jenkins, GitHub Actions, GitLab CI, or similar tools. Strong understanding of pipeline design, artifact management, and deployment strategies. Monitoring & Observability: Experience with monitoring stacks such as Prometheus, Grafana, Splunk, and ELK. Skilled in building dashboards, alerts, and tuning observability for ML-specific use cases. Cloud Infrastructure: Experience deploying systems on AWS/Azure/GCP. Familiar with cloud-native services, serverless computing, and managed Kubernetes offerings (EKS, AKS, GKE). Comfortable with Linux internals and shell scripting. Security & Networking: Knowledge of security best practices for MLOps, including data privacy, compliance, access controls, and encryption. Understanding of modern networking protocols (mTLS) and secure service communication. Collaboration & Agile Delivery: Strong communication skills and experien
Want this job?
Let DoneWithWork tailor your resume to this exact posting, write the cover letter, and submit the application for you.
Apply with DoneWithWork — $19.99/mo