What You’ll Do

Design and build distributed systems powering the zCLOUD platform
Develop control planes for GPU and resource orchestration
Build scalable, fault-tolerant services for multi-tenant environments
Optimize system reliability, latency, and throughput at scale
Collaborate with GPU systems and research teams on integration
Implement observability, monitoring, and debugging capabilities
Own services from design through production operation
Document system architecture and operational best practices

You’ll Thrive Here if You

Have 5+ years of experience building distributed systems
Possess a strong understanding of concurrency, consistency, and fault tolerance
Have experience designing APIs and service-oriented architectures
Are comfortable operating and debugging systems in production
Demonstrate a strong ownership mindset and engineering rigor

Bonus Qualifications

Experience with cloud platforms or large-scale infrastructure
Familiarity with scheduling or resource management systems
Experience supporting AI or data-intensive workloads
Background in platform or control-plane engineering

Why This Role is Unique

You will help build the backbone of a new cloud paradigm designed specifically for GPU-driven AI workloads.

Details

Competitive salary and equity based on experience and skill set
Flexible work environment
Applicants must be authorized to work in their respective location

‍

Interested in This Position?

Apply

Memberships & Commitments

Zettabyte is a supplier of the United Nations Global Marketplace (UNGM #1117361) and is committed to aiding the United Nations in achieving its Sustainable Development Goals.

Distributed Systems Engineer – zCLOUD

What You’ll Do

You’ll Thrive Here if You

Bonus Qualifications

Why This Role is Unique

Details

Products

Services

Company

Resources