ML Systems Performance Engineer

at Cerebras Systems

Cerebras SystemsIndia OfficeOnsitePosted 2026-06-16

Want this job?

Let DoneWithWork tailor your resume to this exact posting, write the cover letter, and submit the application for you.

Apply with DoneWithWork — $19.99/mo

View original posting →

Job description

ABOUT THE ROLE Engineers on the inference performance team operate at the intersection of hardware and software, driving end-to-end model inference speed and throughput. Their work spans low-level kernel performance debugging and optimization, system-level performance analysis, performance modeling and estimation, and the development of tooling for performance projection and diagnostics. RESPONSIBILITIES - Build performance models (kernel-level, end-to-end) to estimate the performance of state of the art and customer ML models. - Optimize and debug our kernel micro code and compiler algorithms to elevate ML model inference speed, throughput and compute utilization on the Cerebras WSE. - Debug and understand runtime performance on the system and cluster. - Develop tools and infrastructure to help visualize performance data collected from the Wafer Scale Engine and our compute cluster. REQUIREMENTS - Bachelors / Masters / PhD in Electrical Engineering or Computer Science. - Strong background in computer architecture. - Exposure to and understanding of low-level deep learning / LLM math. - Strong analytical and problem-solving mindset. - 3+ years of experience in a relevant domain (Computer Architecture, CPU/GPU Performance, Kernel Optimization, HPC). - Experience working on CPU/GPU simulators. - Exposure to performance profiling and debug on any system pipeline. - Comfort with C++ and Python.

Want this job?

Let DoneWithWork tailor your resume to this exact posting, write the cover letter, and submit the application for you.

Apply with DoneWithWork — $19.99/mo

View original posting →