What hardware does DeepSeek V4 use for training and inference, and how does it compare to Nvidia-based systems?

Q: What are the key deployment considerations for enterprises using DeepSeek V4 in regulated industries?

Enterprises must account for Huawei-specific software stacks (CANN, MindSpore), verify data lineage for compliance with SOC 2 or ISO 27001, and implement observability tools that support Ascend PMU metrics. Using private AI gateways for input sanitization and output filtering is recommended for finance, healthcare, or government use cases.

DeepSeek V4 was trained on Huawei’s Atlas 900 clusters using Ascend 910B AI processors, delivering up to 320 TFLOPS per chip in FP16. Performance is comparable to Nvidia H100 systems, with V4 achieving 87.3% on MMLU, and 78.2% on HumanEval. Inference requires Huawei’s CANN library and MindSpore framework, though compatible servers can deliver similar latency and throughput.

DeepSeek V4 Launches on Huawei Chips: Open-Source AI Model Challenges GPT-5.5 and Gemini 3.1-Pro Without Nvidia

DeepSeek V4 Arrives: The Powerful, Open-Source Chinese AI Model That Doesn’t Need Nvidia

DeepSeek’s V4 release marks a significant pivot in the global AI hardware landscape, demonstrating that frontier model training can proceed without reliance on Nvidia’s CUDA ecosystem. By leveraging Huawei’s Ascend 910B AI processors and optimizing for Huawei’s CANN (Compute Architecture for Neural Networks) stack, the Chinese startup has achieved competitive performance metrics while reducing dependency on Western silicon. This move aligns with broader geopolitical efforts to establish sovereign AI supply chains, though it introduces recent considerations for enterprises evaluating model deployment, latency, and long-term supportability in heterogeneous computing environments.

View this post on Instagram about Huawei, Ascend

From Instagram — related to Huawei, Ascend

The Tech TL;DR:

DeepSeek V4 achieves 87.3% on MMLU and leads open-source coding benchmarks (HumanEval: 78.2%) using Huawei Ascend 910B hardware.
The model supports a 1M-token context window and demonstrates strong agent-like behavior in multi-step software engineering tasks.
Enterprises deploying this model must account for Huawei-specific software stacks and verify compliance with data sovereignty requirements in regulated industries.

The core technical shift in V4 lies in its training pipeline, which migrated from Nvidia H100 clusters to Huawei’s Atlas 900 AI training clusters powered by Ascend 910B chips. According to Huawei’s official CANN documentation, each Ascend 910B delivers up to 320 TFLOPS of FP16 performance, and the Atlas 900 pod scales to 1024 chips, enabling comparable floating-point throughput to Nvidia’s SuperPOD systems. DeepSeek’s engineering team confirmed in a recent GitHub discussion that they utilized Huawei’s MindSpore framework — adapted for LLM training — to replace PyTorch’s native CUDA operations, requiring custom kernel fusions for attention mechanisms and feed-forward networks. This transition involved rewriting approximately 40% of the training loop’s low-level ops to target the Ascend instruction set, a non-trivial effort that underscores the commitment to hardware independence.

“We didn’t just swap GPUs for NPUs; we had to re-optimize the entire transformer kernel for Huawei’s matrix multiplication layout and memory hierarchy. The latency per token improved by 18% in our internal tests due to better on-chip SRAM utilization, but the initial port added three weeks to our sprint cycle.”

— Li Wei, Lead Systems Engineer, DeepSeek (via internal engineering log, April 2026)

Performance-wise, DeepSeek V4 matches or exceeds leading closed models in specific domains. Independent testing by Val and AI (as referenced in their April 2024 LLM Leaderboard) shows V4 Pro scoring 78.2% on HumanEval, surpassing CodeLlama 70B (65.1%) and nearing GPT-4 Turbo’s 82.0%. On MMLU, V4 achieves 87.3%, closing the gap with Gemini 3.1-Pro (89.1%) and Claude 3 Opus (86.8%). Notably, the model’s 1M-token context window — implemented via ring attention with KV cache compression — allows processing of entire codebases or legal documents in a single pass, a capability that directly benefits agent-based workflows such as automated pull request generation or multi-hop regulatory analysis.

However, the shift to Huawei hardware introduces operational friction for teams accustomed to Nvidia’s mature toolchain. Deployment requires the Huawei Container Runtime (HCR) and specific versions of the CANN library (v6.0.rc1 or later), which are not yet fully integrated into mainstream Kubernetes distributions. Enterprises seeking to run V4 inference must either use Huawei’s cloud offerings or deploy on-premises Atlas servers with validated driver stacks. This reality creates a clear triage point: organizations evaluating DeepSeek V4 for production use will need partners who understand both LLM serving at scale and the nuances of Huawei’s AI infrastructure.

DeepSeek V4 Arrives: The Powerful, Open-Source Chinese AI Model That Doesn't Need Nvidia — Huawei Ascend Nvidia

For example, a fintech company in Singapore looking to deploy V4 for real-time fraud pattern analysis across 500K daily transactions would face challenges in latency monitoring and GPU utilization tracking — standard tools like NVIDIA DCGM or Prometheus exporters don’t natively support Ascend metrics. In such cases, engaging a managed service provider familiar with heterogeneous AI stacks becomes critical. Firms specializing in cross-platform AI observability can help instrument custom metrics exporters for Huawei’s PMU (Performance Monitoring Unit) data and integrate them into Grafana dashboards.

Similarly, software development agencies building agent-based tooling on V4’s API — which now supports function calling and tool use akin to OpenAI’s Assistants API — must account for the model’s specific tokenization quirks. DeepSeek’s tokenizer, while based on SentencePiece, uses a 100K-byte vocabulary trained on a mixed Chinese-English corpus, leading to occasional subword fragmentation in English technical text. A recent Stack Overflow thread highlighted how this affects code generation fidelity, particularly in languages with verbose syntax like Java or SAP ABAP. Developers addressing this issue often implement post-processing rules or fine-tune adapters using LoRA on domain-specific corpora.

From a security perspective, the model’s open weights release under the MIT License allows for full auditing, but its training data provenance remains partially opaque. DeepSeek has not published a full datasheet for V4’s pre-training corpus, though they confirm it includes web crawl data, GitHub repositories, and licensed Chinese-language texts up to Q3 2025. This lack of transparency raises concerns for industries requiring SOC 2 Type II or ISO 27001 compliance, where data lineage is mandatory. Enterprises in finance or healthcare should therefore consider running V4 through a private AI gateway that enforces input sanitization and output filtering — a service increasingly offered by specialized MSPs focused on AI risk mitigation.

Independent validation of DeepSeek’s claims continues to evolve. The Allen Institute for AI’s Holistic Evaluation of Language Models (HELM) benchmark, updated in March 2026, now includes Ascend-based inference paths and shows V4 ranking in the top tier for reasoning and code generation. However, HEML’s lead researcher noted in a recent interview that “reproducibility hinges on access to the exact training checkpoint and tokenizer version — something still not fully mirrored in public repositories.” This gap underscores the need for third-party verification labs, such as those operated by national cybersecurity centers or academic-industry consortia, to validate performance claims under controlled conditions.

Despite these complexities, the strategic implication is clear: DeepSeek V4 proves that cutting-edge LLMs are no longer tethered to a single hardware vendor. For enterprises seeking to de-risk their AI supply chains — especially those operating under export control restrictions or pursuing digital sovereignty — this represents a tangible alternative. The model’s strong performance in agentic tasks and long-context handling makes it viable for use cases ranging from autonomous DevOps to legal contract analysis, provided the deployment team accounts for the Huawei-specific stack.

As the AI hardware landscape diversifies, the ability to evaluate and operate models across multiple architectures will become a core competency for platform engineers. Those who invest now in understanding alternatives like Huawei’s Ascend — through hands-on labs, certification programs, or partnerships with specialized integrators — will be better positioned to navigate the next phase of AI infrastructure evolution.

Disclaimer: The technical analyses and security protocols detailed in this article are for informational purposes only. Always consult with certified IT and cybersecurity professionals before altering enterprise networks or handling sensitive data.

China's DeepSeek releases new AI model adapted to run on Huawei chips, challenging US giants

DeepSeek V4 Launches on Huawei Chips: Open-Source AI Model Challenges GPT-5.5 and Gemini 3.1-Pro Without Nvidia

DeepSeek V4 Arrives: The Powerful, Open-Source Chinese AI Model That Doesn’t Need Nvidia

Share this:

Related