01
arXiv Preprint
Dense vs Sparse Pretraining at Tiny Scale: Active-Parameter vs Total-Parameter Matching
Abdalrahman Wael · arXiv:2605.13769 · May 2026
A controlled tiny-scale pretraining study comparing dense and mixture-of-experts transformers under active-parameter and total-parameter matching.