Sr. Engineer, Site Reliability - Retail Mobility Engineering
at T-Mobile
Want this job?
Let DoneWithWork tailor your resume to this exact posting, write the cover letter, and submit the application for you.
Apply with DoneWithWork — $19.99/moJob description
At T-Mobile, we invest in YOU! Our Total Rewards Package ensures that employees get the same big love we give our customers. All team members receive a competitive base salary and compensation package - this is Total Rewards. Employees enjoy multiple wealth-building opportunities through our annual stock grant, employee stock purchase plan, 401(k), and access to free, year-round money coaches. That’s how we’re UNSTOPPABLE for our employees!Job OverviewThis role ensures the reliability and resilience of digital infrastructure, enabling efficient software development and deployment. It focuses on automating processes and reducing manual effort to prevent operational incidents, improve system performance, and enhance overall operational efficiency.The role requires expertise in programming, scripting, incident response management, and a variety of technical tools to maintain system robustness, improve deployment quality, and drive continuous operational improvements. Success is measured by system stability, incident reduction, and ongoing gains in operational efficiency.As part of a Retail Mobility Engineering organization, this role supports a large-scale enterprise device management and application delivery ecosystem that enables critical retail operations.The Senior Site Reliability Engineer leverages automation, CI/CD practices, scripting, observability, and incident management expertise to improve reliability, scalability, and operational efficiency across a complex technology environment.This work directly impacts organizational stability and customer experience by ensuring the availability, performance, and reliability of critical systems.Job Responsibilities:Automate processes to accelerate software development and deployment while minimizing manual interventions, including CI/CD pipelines, deployment automation, and endpoint management workflows where appropriate.Design, build, and enhance automation solutions that improve operational efficiency, deployment consistency, and service reliability across complex enterprise environments.Enhance system reliability and resilience by identifying issues and implementing preventive measures to reduce downtime and improve operational stability.Conduct root cause analysis and collaborate with problem management teams to prevent incident recurrence and improve system operations.Leverage programming, scripting, and incident response expertise to improve system robustness, deployment quality, and operational efficiency.Implement and support secure automation practices, including secrets management, credential lifecycle automation, and integration with approved enterprise security platforms.Partner with product, engineering, cybersecurity, and operations teams to design and implement scalable deployment and automation solutions.Support modernization initiatives involving mobile device management platforms, application deployment automation, platform migrations, and operational process improvements.Improve deployment visibility, monitoring, operational reporting, and observability capabilities to enhance traceability and operational awareness.Apply problem-solving and analytical skills to prevent operational incidents and maintain system stability.Continuously learn new skills and technologies to adapt to changing environments and drive innovation.Perform other duties and projects as assigned.Education and Work Experience:Bachelor's degree plus 3 years of related work experience, advanced degree with 1 year of related work experience, or a combination of education and experience deemed equivalent.Degree in Computer Science, Engineering, or a related technical field.4 - 7 years working in operations or develops environments.4 - 7 years troubleshooting customer related issues and managing customer relationships.4 - 7 years developing software solutions using Python or similar programming languagesPreferred:Experience working in Site Reliability Engineering, DevOps, platform engineering, operations, or software development environments.Experience troubleshooting production issues and supporting business-critical systems.Experience developing automation solutions using Python, Bash, or similar programming languages.Experience designing and implementing CI/CD pipelines using GitLab or similar platforms.Experience automating deployment, configuration management, and operational workflows.Experience integrating and automating CyberArk, HashiCorp Vault, or similar enterprise secrets-management platforms.Experience administering Jamf Pro and/or supporting enterprise mobile device management environments.Experience supporting mobile device lifecycle management, application deployment, policy management, and migration initiatives.Experience applying cybersecurity best practices across software delivery and operational environments.Experience contributing to technical design decisions and collaborating across multiple engineering teams.Experience with cloud platforms, infrastructure automation, and containerized environments.Experience with observability, monitoring, and operational reporting solutions.Knowledge, Skills and Abilities:Strong analytical and problem-solving skills with the ability to identify, troubleshoot, and resolve complex operational issues.Ability to design, implement, and maintain reliable, scalable automation solutions across complex enterprise environments.Strong understanding of Site Reliability Engineering (SRE), DevOps practices, system reliability, resiliency, and operational excellence.Experience designing and implementing CI/CD pipelines using GitLab or similar platforms.Proficiency in Python, Bash, or similar scripting languages for automation, integration, and operational tooling.Experience with deployment automation, configuration management, and infrastructure automation practices.Strong understanding of incident management, root cause analysis, and preventive operational improvement methodologies.Experience administering an
Want this job?
Let DoneWithWork tailor your resume to this exact posting, write the cover letter, and submit the application for you.
Apply with DoneWithWork — $19.99/mo