AI Infrastructure Engineer

Palo Alto, USA

Full-time

Hybrid

Design and operate infrastructure supporting large-scale AI training and inference workloads.

What You’ll Do

Build infrastructure layers supporting large-scale AI workloads
Integrate compute, networking, and storage for GPU-intensive systems
Support training and inference pipelines in production environments
Optimize performance, reliability, and cost efficiency
Collaborate with research and platform teams on system design
Automate deployment and scaling of AI infrastructure
Troubleshoot complex infrastructure and workload issues
Document deployment architectures and operational workflows

You’ll Thrive Here if You

Have 4+ years of experience in infrastructure or systems engineering
Have hands-on experience running AI workloads in production
Possess a strong understanding of distributed infrastructure trade-offs
Are comfortable working across software and hardware boundaries
Are execution-focused with strong debugging skills

Bonus Qualifications

Experience with containerization and orchestration systems
Familiarity with ML frameworks and AI pipelines
Experience with high-throughput networking or storage systems
Background in data centers or large-scale compute platforms

Why This Role is Unique

You will enable AI workloads to run reliably and efficiently at scale, bridging research, platform, and real-world deployment.

Details

Competitive salary and equity based on experience and skill set
Flexible work environment
Applicants must be authorized to work in their respective location

Interested in This Position?