ChaiTex
ChaiTexGPU Solutions
Back to GPU Cards
YH001 PCIe Accelerator Card 48GB 1

YH001 PCIe Accelerator Card 48GB

Next-generation AI chip designed as a foundation for cloud computing and large-scale language models (LLMs). It’s not just an accelerator, but a full architectural platform focused on matrix efficiency, scalability, and flexibility for custom AI workloads. At its core is a hybrid instruction set approach: RISC-V (base) with the RVV (vector extension), enhanced by custom matrix instructions and a proprietary Virtual Instruction Set Architecture (VISA). This provides a key advantage — the ability to finely tune execution for specific models and algorithms, unlike the fixed instruction sets found in traditional GPUs. From a compute perspective, the chip follows a TPU-like architecture. It features dual systolic array matrix engines optimized for dense linear algebra operations typical in LLMs and deep learning. Complementing this is a high-performance 4D DMA engine, addressing one of the main bottlenecks in modern accelerators: data movement. As a result, the design achieves high efficiency in both computation and memory transfer. A strong emphasis is placed on optimization for large models, particularly architectures similar to DeepSeek. The chip supports Blocked FP8 precision, enabling significant reductions in memory usage and increased throughput without critical accuracy loss — especially important for both training and inference at scale. For scalability, the chip uses a proprietary YHLink interconnect. Positioned as an alternative to NVIDIA NVLink, it is designed for building large-scale clusters and supports advanced features like In-Network Computing. This allows certain operations to be executed directly within the network, reducing latency and offloading compute from the chips themselves. Overall, this is a data center–class AI processor tailored for: large language models (LLMs) distributed training high-throughput inference scalable AI cluster deployments The core idea is a shift away from general-purpose GPU architectures toward deep vertical optimization for AI workloads, where not only FLOPS matter, but also efficiency in memory access, interconnect, and custom data formats."

VRAM
48
FP32
64
TDP
-
Interface
PCIe 5.0x16
Pre-order
Product status

Memory Specifications

VRAM
48
VRAM Type
HBM3e
Memory Bandwidth
1000
Interface
PCIe 5.0x16
Interconnect Type
YHLink
Interconnect Speed
600

Computing Performance

FP64 VectorINT4 TOPS 1024
FP32 Vector64
FP16 Vector-
TF32 Tensor-
FP/BF16 Tensor256
FP8 Tensor512
INT8 Tensor512

Graphics & Thermal

Pixel Rate (GPixels/s)
-
Texture Rate (GTexels/s)
-
TDP
-
Cooling Type
Пассивное
Form Factor
PCIe Card
Video Encoding
-
Video Decoding
-

Architecture

Architecture
TPU Архитектура
Cores
-
Price
On request