What You’ll Do
- Build infrastructure layers supporting large-scale AI workloads
- Integrate compute, networking, and storage for GPU-intensive systems
- Support training and inference pipelines in production environments
- Optimize performance, reliability, and cost efficiency
- Collaborate with research and platform teams on system design
- Automate deployment and scaling of AI infrastructure
- Troubleshoot complex infrastructure and workload issues
- Document deployment architectures and operational workflows
You’ll Thrive Here if You
- Have 4+ years of experience in infrastructure or systems engineering
- Have hands-on experience running AI workloads in production
- Possess a strong understanding of distributed infrastructure trade-offs
- Are comfortable working across software and hardware boundaries
- Are execution-focused with strong debugging skills
Bonus Qualifications
- Experience with containerization and orchestration systems
- Familiarity with ML frameworks and AI pipelines
- Experience with high-throughput networking or storage systems
- Background in data centers or large-scale compute platforms
Why This Role is Unique
You will enable AI workloads to run reliably and efficiently at scale, bridging research, platform, and real-world deployment.
Details
- Competitive salary and equity based on experience and skill set
- Flexible work environment
- Applicants must be authorized to work in their respective location