zFABRIC™

Break free from vendor constraints and turn flexibility into strategic advantage.

zFABRIC

zFABRIC seamlessly interconnects multiple GPUs across servers or nodes to create a unified, high-performance cluster. It maximizes throughput and minimizes latency for GPU-to-GPU communication, essential for distributed training, model parallelism, and data parallelism.

zFABRIC lets you use the GPUs you choose without sacrificing performance or compatibility. With straightfoward scaling across mixed hardware, zFABRIC transforms sourcing flexibility into tangible cost savings.  

Sovereign-Grade Networking

1
Heterogeneous Cluster

zFABRIC unifies diverse GPU and accelerator types into a coherent networking fabric, enabling flexible scaling across mixed hardware generations and architectures.

2
Supply Chain Independence

Designed to work across multiple technology and switch vendors, zFABRIC ensures consistent performance without locking customers into a single hardware ecosystem.

3
High Resilience Fabric

zFABRIC enhances network reliability with advanced failover, rapid recovery, and health checks, ensuring stable, predictable performance for AI workloads at scale.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Give us a Call
+1 650 260 1009

Frequently Asked Questions

Helpful information and answers related to the product.

What is zFABRIC?

zFABRIC is a high performance RDMA networking solution purpose built for AI and GPU clusters. zFABRIC is like using high quality, non dealer performance parts in your race car enabling AI clusters to scale efficiently across racks and data centers without relying on closed or vendor specific networking. zFABRIC delivers the performance required for distributed AI training while giving operators flexibility in hardware sourcing, faster deployments, and lower long-term operating costs.

How does zFABRIC improve total cost of ownership (TCO)?

zFABRIC lowers CAPEX and OPEX for our customers by enabling mixed hardware generations, supporting multiple network vendors, and reducing downtime through automated recovery. Customers who deploy zFABRIC avoid vendor lock-in, extend hardware lifespan, and reduce operational overhead, significantly improving TCO.

How does zFABRIC improve reliability and uptime?

zFABRIC is designed to keep AI systems productive even when underlying components fail. Through automated failover, continuous link health monitoring, intelligent rerouting, and rapid recovery, zFABRIC minimizes disruption to training and inference workloads. This reduces GPU hang time, protects delivery timelines, and allows operators to meet SLA expectations minimizing manual intervention, resulting in more predictable operations and fewer costly interruptions, or mean time to recovery (MTTR).

Is zFABRIC limited to NVIDIA GPUs?

No, zFABRIC is vendor agnostic and supports heterogeneous GPU and accelerator environments based on open RDMA standards such as RoCEv2. This allows organizations to deploy and operate AI infrastructure using NVIDIA, AMD, or other accelerators without being locked into a single vendor ecosystem. As a result, customers can source hardware more flexibly, extend the usable life of existing assets, adapt faster to supply or pricing changes, and reduce long-term infrastructure costs while maintaining consistent performance at scale.

Which networking protocol does zFABRIC use and why?

zFABRIC primarily uses RoCEv2 (RDMA over Converged Ethernet) to deliver high-performance GPU networking on standard Ethernet infrastructure. This enables near InfiniBand performance while using widely available switches, optics, and cabling. As a result, customers can deploy AI clusters more quickly, scale across vendors and sites with less friction, and achieve high performance without the cost and constraints of proprietary networking stacks.

How many GPUs does zFABRIC support?

zFABRIC is designed to scale from thousands to hundreds of thousands of GPUs within a single AI environment. Scaling limits are determined by physical factors such as optics speed, switch capacity, and data center power and cooling, not by the zFABRIC software itself. This allows organizations to start at practical cluster sizes and expand over time without redesigning the network thus reducing deployment delays, protecting existing investments, and avoiding premature infrastructure replacement.

Can zFABRIC support cross data center AI clusters?

Yes, zFABRIC enables AI training and inference to run across geographically distributed data centers, allowing organizations to scale beyond a single site without redesigning their network. This makes it possible to bring capacity online faster, use existing facilities more effectively, and avoid costly overbuild in one location. By supporting long distance interconnection with production ready designs, zFABRIC allows teams to operate distributed AI systems reliably while improving utilization and lowering the total cost of scaling AI infrastructure.

testimonials

Customer Success Story

Features

We empower sovereigns to build AI infrastructure without geopolitical exposure, vendor lock-in, or dependency.

"For our internal model training, Zettabyte’s zSUITE delivered meaningful improvements across our AI infrastructure operations, particularly in GPU utilization, cluster visibility, and operational efficiency.

Among the other open source platform we have tested, zSUITE provides better performance to manage and scale complex AI workloads. We view zSUITE as a strong software foundation for next-generation AI infrastructure."

David Shen
COO, Wistron Group

"Zettabyte’s software has been instrumental in helping WiAdvance’s enterprise customers deploy and scale AI with confidence. By simplifying GPU management and improving utilization and visibility, Zettabyte enables organizations to move from pilot projects to production AI faster and more efficiently.

It has become a key enabler for enterprises looking to expand their AI capabilities while maintaining reliability and operational control."

Michael Hsia
CEO, WiAdvance

"Zettabyte played a key role in helping the Foxbrain team accelerate our LLM training efforts. The platform delivered tangible performance improvements that shortened training cycles, while its developer-centric features made it easier for our engineers to iterate, debug, and optimize workloads.

With better visibility and control across our GPU infrastructure, we were able to move faster from experimentation to large-scale training with confidence."

Tran Nhiem
Technical Lead, Foxconn
testimonials

Customer success story

Features

We empower sovereigns to build AI infrastructure without geopolitical exposure, vendor lock-in, or dependency.

"Working with this team completely changed our deployment timeline. Their AI-optimized data hall design cut months off our build schedule and saved us from costly rework. Truly exceptional support from start to finish."

Alex Roberts
Project Manager, Foxconn

"Their system unified all of our GPU nodes into a single, easy-to-manage environment. The automation features alone saved our team countless hours each week. The ROI was immediate".

Jordan Parker
AI Team Lead, Fortune 500 Technology Company

"Their monitoring layer helped us eliminate bottlenecks we didn’t even know we had. Workloads balance perfectly, failures self-correct, and our team spends less time babysitting jobs. Exceptional product and exceptional support."

Alex Roberts
AI Engineer, Leading Semiconductor Company