LLMMIT + DeepSeek LicenseJapanese

Zensei (禅精)

An experimental continued pretraining fork of DeepSeek-V3 exploring Japanese language adaptation. Research project — not production-ready.

Model Specifications

Base model

DeepSeek-V3

Total parameters

671B

Active per token

37B (MoE)

Vocabulary

145K tokens

Context window

128K tokens

Context extension

YaRN

Architecture

Built on DeepSeek-V3's MoE backbone.

Multi-head Latent AttentionMLA mechanism for efficient inference

DeepSeekMoEAuxiliary-loss-free load balancing

Vocabulary expansion129K → 145K with Japanese subwords

YaRN extension128K token context window

Training Pipeline

Stage 1Vocabulary alignment

Align new Japanese tokens with existing embeddings

Stage 2Continued pretraining

Full training on large-scale Japanese corpora

Curriculum scheduling progresses from short to long contexts. Distributed training with DeepSpeed and FSDP.

Evaluation

JCommonsenseQA

Japanese commonsense reasoning

JNLI

Japanese natural language inference

MARC-ja

Japanese sentiment classification

Deployment

FastAPI inference serverDocker supportMakefile automation