Zplatform

Unleash the full power of Zware through an intuitive, user-friendly interface

Designed for AI engineers of all levels, it enables seamless task execution, streamlined setup, and effortless access to token-based APIs—without the complexity.

Please select a product and scroll down for more details.
Zplatform Performance Metrics - Launch AI Workloads 30% FasterZplatform Performance Metrics - Launch AI Workloads 30% Faster
For Cloud & AI Service Providers

Zware AICloud

Intelligent Computing Control and Scheduling Platform designed for AI large model pre-training and control scheduling

Built For

  • GPU cloud service providers (IaaS/PaaS)
  • AI model training platforms (MLaaS)
  • Multi-tenant GPU hosting providers
  • Enterprise AI platforms

Key Metrics

5000+
Number of Tenants
10,000+
GPU Cards Supported
99.9%
Task Success Rate
Up to 20%
MFU Improvement

Background

With the rapid development of artificial intelligence technology, large model computing power infrastructure has become a key pillar in digital transformation, greatly empowering the digital economy.

To support AI large model training and other tasks, large-scale intelligent computing centers composed of thousands, tens of thousands, or even hundreds of thousands of GPU clusters are needed to meet the computing power demands. These computing cards must work collaboratively to provide sufficient computing power to handle and update the massive parameters in models.

Key Challenges

Faced with challenges such as ultra-large scale, numerous configurations, high performance, and fine granularity, the primary innovation points revolve around how to efficiently manage and utilize computing power resources and complete large model training tasks with high quality.

Product Architecture

The Zware-AICloud platform is designed for AI large model pre-training and control scheduling, ensuring efficiency through full-end intelligent computing capabilities.

End-to-end Integration
Heterogeneous Adaptation
Intelligent Monitoring
AI Cloud Product Architecture

Core Features

Task Submission and Related Services

Visual interface supporting custom settings for distributed tasks, with built-in PyTorch and MPI frameworks. Includes comprehensive task, storage, and image management services.

Large-Scale Distributed Scheduling

Powerful distributed scheduling engine supporting thousands to tens of thousands of cards. Features priority scheduling, reclamation strategies, preemptive scheduling, and fault-tolerant task restart scheduling.

Heterogeneous and Long-Distance Control

Unified cluster management of heterogeneous computing power with real-time resource usage monitoring. Supports cross-data-center large model training with optimized long-distance scheduling.

Automatic Fault Detection and Alerts

Linkage with control and maintenance systems for automatic operational status capture. Comprehensive network monitoring with automatic fault detection, abnormal alarms, and equipment inspection.

User Value

Production Proven Excellence

Through Zware-AICloud, users achieve control and scheduling of ultra-large-scale intelligent computing clusters with automatic fault tolerance. Currently deployed in multiple large-scale intelligent computing clusters supporting up to 2000P computing power per single cluster.

Large-Scale Distributed Scheduling

Built-in powerful distributed scheduling engine supporting thousands to tens of thousands of cards with priority scheduling, reclamation strategies, and fault-tolerant restart scheduling.

Heterogeneous Computing Support

Unified scheduling of heterogeneous computing power with targeted adaptation for different manufacturers' GPU cards, enabling collaborative multi-vendor training.

Fault Prediction & Recovery

Real-time monitoring and predictive maintenance detect potential problems in advance, reducing system failures and downtime.

Congestion Control

Automatic parameter tuning in DCQCN congestion control with distributed architecture supporting dynamic scaling and load balancing.

Ready to Transform Your AI Infrastructure?

Experience the power of Zware AICloud and unlock the full potential of your large-scale AI workloads.