LLMMIT + DeepSeek LicenseJapanese

Zensei (禅精)

An experimental continued pretraining fork of DeepSeek-V3 exploring Japanese language adaptation. Research project — not production-ready.

Model Specifications

Base model

DeepSeek-V3

Total parameters

671B

Active per token

37B (MoE)

Vocabulary

145K tokens

Context window

128K tokens

Context extension

YaRN

Architecture

Built on DeepSeek-V3's MoE backbone.

Multi-head Latent AttentionMLA mechanism for efficient inference
DeepSeekMoEAuxiliary-loss-free load balancing
Vocabulary expansion129K → 145K with Japanese subwords
YaRN extension128K token context window

Training Pipeline

Two-stage curriculum training.

Stage 1Vocabulary alignment

Align new Japanese tokens with existing embeddings

Stage 2Continued pretraining

Full training on large-scale Japanese corpora

Curriculum scheduling progresses from short to long contexts. Distributed training with DeepSpeed and FSDP.

Evaluation

JCommonsenseQA

Japanese commonsense reasoning

JNLI

Japanese natural language inference

MARC-ja

Japanese sentiment classification

Deployment

Infrastructure

FastAPI inference serverDocker supportMakefile automation

A Stratophic Lab project

GitHub