MetaX C-588 128GB - MetaX Integrated Circuits

Product overview

Purpose-built accelerator for enterprise AI workloads

Designed for dense inference, model adaptation, and private AI infrastructure where predictable availability, localized supply, and software compatibility are critical.

Architecture

GPGPU/XCORE

Memory & Performance

High-bandwidth memory profile for AI inference and training.

VRAM capacity128

Memory typeHBM3e

Memory bandwidth1200

Interconnect typeMetaXLink

Interconnect speed896

Architecture

Compute architecture and software execution model.

ArchitectureGPGPU/XCORE

Compute units-

Power & Thermal

Data center integration requirements.

Thermal design power850

CoolingПассивное

Form factorOAM

Pixel Rate-

Texture Rate-

Benchmarks

Peak theoretical compute for common AI precisions.

FP32

FP16

480

TF32

240

BF16 Tensor

480

FP8 Tensor

INT8 Tensor

960

Compatibility

Interfaces, frameworks, and deployment environment.

PCIe interfacePCIe 5.0x16

Video encoding-

Video decoding-

Physical Dimensions

Card dimensions for server platforms.

Slots-

Length- mm

Height- mm

Width- mm

Pricing

On request

Volume pricing available for cluster deployments and pilot batches.

Product information

About

The MetaX C-588 is a flagship compute accelerator that represents a strategically sound response to the modern challenges in the field of artificial intelligence.

While most manufacturers of graphics accelerators are competing to increase teraflops, the company MetaX has made a bet on a different, no less important resource for modern AI: memory. Their flagship C-588 accelerator is the embodiment of this philosophy, offering a record 128 GB of ultra-fast HBM3e on a single card. This is not just a technical solution, but a strategic move that allows for a drastic simplification of the infrastructure for large language model inference. Thanks to this volume of memory, the need for the complex and slow distribution of a single model across multiple cards disappears, which reduces latency and simplifies the scaling of services.

In terms of "raw" compute power, the card occupies a niche between the NVIDIA A100 and H100, making it an ideal choice for tasks where data volume is more important than peak performance. For combining multiple accelerators into a single pool, the proprietary MetaXLink interconnect is used, providing high-speed data exchange. The card is built in an energy-efficient OAM 2.0 form factor with passive cooling and a power consumption of 850 W, which requires an appropriate server chassis. Thus, the MetaX C-588 is less a competitor in the gigaflops race and more a specialized and highly efficient tool for businesses that need to deploy complex neural networks with minimal infrastructure costs.

The shift in focus from FLOPS to memory is not only a question of speed but also of operational economics. Distributed inference, when one model is spread across multiple cards, requires extremely complex software for data synchronization between GPUs. This inevitably generates huge overhead:

Delays in data transfer via the bus (NVLink/NVSwitch).
The need for load balancing.
State synchronization (KV-cache) between nodes.

As a result, getting results from two cards often requires additional effort and resources. A card with 128 GB of memory, such as the MetaX C-588, eliminates this problem at its root. It allows for the implementation of a simple and effective 1 card = 1 instance architecture, which radically reduces the requirements for the qualifications of support engineers and simplifies service scaling.

For three years, the AI GPU market was driven by the logic of "more TFLOPS - more money." The H100, MI300X, Blackwell - each new chip promised to double peak performance. But practice has shifted priorities.

For inference, which already accounts for over 50% of AI expenses in China, the critical factor is not peak compute speed, but memory volume. A 70B parameter model in FP16 weighs ~140 GB. In GPTQ-4bit ~35 GB. But you also need space for KV-cache, batches, and system overhead.

On an 80GB H100, this turns into a headache: either aggressive quantization with a loss of quality, or distributed inference with increased latency.

128 GB changes the equation. A model up to 120B parameters (in a quantized format) can be hosted on a single card. One GPU = one instance + minimum latency + minimum complexity. This is exactly the package offered by the MetaX C-588.