An experimental continued pretraining fork of DeepSeek-V3 exploring Japanese language adaptation. Research project — not production-ready.
Model Specifications
Base model
DeepSeek-V3
Total parameters
671B
Active per token
37B (MoE)
Vocabulary
145K tokens
Context window
128K tokens
Context extension
YaRN
Architecture
Training Pipeline
Align new Japanese tokens with existing embeddings
Full training on large-scale Japanese corpora
Curriculum scheduling progresses from short to long contexts. Distributed training with DeepSpeed and FSDP.
Evaluation
JCommonsenseQA
Japanese commonsense reasoning
JNLI
Japanese natural language inference
MARC-ja
Japanese sentiment classification
Deployment