Inference Optimization Engineer (local / edge runtime)

at Intel

Intel4 LocationsPosted 2026-06-15

Want this job?

Let DoneWithWork tailor your resume to this exact posting, write the cover letter, and submit the application for you.

Apply with DoneWithWork — $19.99/mo

View original posting →

Job description

Job Details:Job Description: Our MissionAt Intel, our journey is to transform AI into something safer, more trustworthy, and respectful of human privacy by design. We believe transformative AI should have a positive impact on people—powerful in capability, yet honest about its limits and protective of the data and resources it touches. To get there, we build agentic AI that combines the best of local and cloud intelligence — private, affordable, and sustainable by design. Small, efficient models run directly on the user's machine (AI PC, edge, on-prem, and beyond), keeping data private and token costs low, while powerful cloud models handle the hardest work: planning, reasoning, and complex problem-solving. Today, neither approach can deliver this alone. Together, they give people real capability without compromise—data stays private, spend stays predictable, and energy use stays in check. We're building intelligence that scales without sacrificing trust, cost, or the planet—because the future of AI should belong to the people it serves Role SummaryMake models fast on the hardware people actually own. You optimize inference engines (llama.cpp, vLLM) for constrained local and edge environments — GPU/iGPUs, Vulkan backends — not datacenter H100 environment, mostly PC/edge. KV cache, batching, quantization, scheduling, and CPU-overhead reduction are your daily tools.This is the rare skill that makes a hybrid, low-cost agent product viable.What you’ll doProfile and optimize local inference (llama.cpp-vulkan and vLLM) for latency, throughput, and memory on edge hardwareTune KV cache, continuous batching, and scheduling for interactive agent workloadsDrive quantization strategy (GGUF / AWQ / GPTQ) and validate quality impact with the Post-Training teamCut CPU overhead and improve engine startup, model load, and lifecycle (start / stop / health)Benchmark across hardware tiers and publish honest performance comparisonsUpstream fixes and patches to open-source engines where it helps usWhat you’ll learn / grow intoCuriosity is required. You will develop:The internals of modern inference engines and where the milliseconds actually goHardware-aware optimization across iGPU / CPU paths (Vulkan, SYCL, oneAPI, CUDA where relevant)The quality-vs-speed-vs-memory trade space for small modelsInterest in local / edge AI and squeezing hardwareQualifications:Minimum qualifications are required to be initially considered for this position. Preferred qualifications are in addition to the minimum requirements and are considered a plus factor in identifying top candidates.You must possess the minimum qualifications to be initially considered for this position. Preferred qualifications are in addition to the minimum requirements and are considered a plus factor in identifying top candidates.Required QualificationsBS/MS in CS, EE, Math or related STEM field5+ years software development backgroundStrong in C++ and/or Python; comfortable reading systems-level codeUnderstands how LLM inference works (attention, KV cache, decoding)Has profiled and optimized real performance problems (CPU or GPU) and can prove the speedupLinux, build systems, and low-level debugging expertisePreferred QualificationsHands-on with llama.cpp, vLLM, ggml, or similar enginesExperience with GPU / accelerator programming (Vulkan, CUDA, SYCL, Metal) or SIMD / CPU kernelsFamiliarity with quantization formats and their quality trade-offsOpen-source contributions to inference enginesRequirements listed would be obtained through a combination of industry relevant job experience, internship experiences and or schoolwork/classes/research.Benefits at IntelOur total rewards package goes above and beyond just a paycheck. Whether you're looking to build your career, improve your health, or protect your wealth, we offer generous benefits to help you achieve your goals. Go to Intel Benefits | Intel Careers for details of benefits available to you. Intel reserves the right to modify, change or discontinue benefit plans at any time in its sole discretion. Job Type:Shift:Shift 1 (United States of America)Primary Location: US, California, Santa ClaraAdditional Locations:US, Arizona, Phoenix, US, California, Folsom, US, Oregon, HillsboroBusiness group:The Client Computing Group (CCG) is responsible for driving business strategy and product development for Intel's PC products and platforms, spanning form factors such as notebooks, desktops, 2 in 1s, all in ones. Working with our partners across the industry, we intend to deliver purposeful computing experiences that unlock people's potential - allowing each person use our products to focus, create and connect in ways that matter most to them.Posting Statement:All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.Position of TrustN/ABenefitsWe offer a total compensation package that ranks among the best in the industry. It consists of competitive pay, stock bonuses, and benefit programs which include health, retirement, and vacation. Find out more about the benefits of working at Intel. Annual Salary Range for jobs which could be performed in the US: $170,500.00-315,490.00 USD The range displayed on this job posting reflects the minimum and maximum target compensation for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific compensation range for your preferred location during the hiring process. Work Model for this RoleThis role will

Want this job?

Let DoneWithWork tailor your resume to this exact posting, write the cover letter, and submit the application for you.

Apply with DoneWithWork — $19.99/mo

View original posting →