Chaitex
Chaitex
Product overview

Purpose-built accelerator for enterprise AI workloads

Designed for dense inference, model adaptation, and private AI infrastructure where predictable availability, localized supply, and software compatibility are critical.

Architecture
GPGPU/XCORE

Memory & Performance

High-bandwidth memory profile for AI inference and training.

VRAM capacity128
Memory typeHBM3e
Memory bandwidth1200
Interconnect typeMetaXLink
Interconnect speed896

Architecture

Compute architecture and software execution model.

ArchitectureGPGPU/XCORE
Compute units-

Power & Thermal

Data center integration requirements.

Thermal design power850
CoolingПассивное
Form factorOAM
Pixel Rate-
Texture Rate-

Benchmarks

Peak theoretical compute for common AI precisions.

FP32
60
FP16
480
TF32
240
BF16 Tensor
480
FP8 Tensor
-
INT8 Tensor
960

Compatibility

Interfaces, frameworks, and deployment environment.

PCIe interfacePCIe 5.0x16
Video encoding-
Video decoding-

Physical Dimensions

Card dimensions for server platforms.

Slots-
Length- mm
Height- mm
Width- mm
Pricing
On request
Volume pricing available for cluster deployments and pilot batches.
Product information

About

The MetaX C-588 is a flagship compute accelerator that represents a strategically sound response to the modern challenges in the field of artificial intelligence.

While most manufacturers of graphics accelerators are competing to increase teraflops, the company MetaX has made a bet on a different, no less important resource for modern AI: memory. Their flagship C-588 accelerator is the embodiment of this philosophy, offering a record 128 GB of ultra-fast HBM3e on a single card. This is not just a technical solution, but a strategic move that allows for a drastic simplification of the infrastructure for large language model inference. Thanks to this volume of memory, the need for the complex and slow distribution of a single model across multiple cards disappears, which reduces latency and simplifies the scaling of services.

In terms of "raw" compute power, the card occupies a niche between the NVIDIA A100 and H100, making it an ideal choice for tasks where data volume is more important than peak performance. For combining multiple accelerators into a single pool, the proprietary MetaXLink interconnect is used, providing high-speed data exchange. The card is built in an energy-efficient OAM 2.0 form factor with passive cooling and a power consumption of 850 W, which requires an appropriate server chassis. Thus, the MetaX C-588 is less a competitor in the gigaflops race and more a specialized and highly efficient tool for businesses that need to deploy complex neural networks with minimal infrastructure costs.

The shift in focus from FLOPS to memory is not only a question of speed but also of operational economics. Distributed inference, when one model is spread across multiple cards, requires extremely complex software for data synchronization between GPUs. This inevitably generates huge overhead:

  • Delays in data transfer via the bus (NVLink/NVSwitch).

  • The need for load balancing.

  • State synchronization (KV-cache) between nodes.

As a result, getting results from two cards often requires additional effort and resources. A card with 128 GB of memory, such as the MetaX C-588, eliminates this problem at its root. It allows for the implementation of a simple and effective 1 card = 1 instance architecture, which radically reduces the requirements for the qualifications of support engineers and simplifies service scaling.

For three years, the AI GPU market was driven by the logic of "more TFLOPS - more money." The H100, MI300X, Blackwell - each new chip promised to double peak performance. But practice has shifted priorities.

For inference, which already accounts for over 50% of AI expenses in China, the critical factor is not peak compute speed, but memory volume. A 70B parameter model in FP16 weighs ~140 GB. In GPTQ-4bit ~35 GB. But you also need space for KV-cache, batches, and system overhead.

On an 80GB H100, this turns into a headache: either aggressive quantization with a loss of quality, or distributed inference with increased latency.

128 GB changes the equation. A model up to 120B parameters (in a quantized format) can be hosted on a single card. One GPU = one instance + minimum latency + minimum complexity. This is exactly the package offered by the MetaX C-588.