Skip to content
Papers

Formal work, published.

Research papers, reports, and longer technical writeups that are worth keeping in one place.

01
arXiv Preprint

Dense vs Sparse Pretraining at Tiny Scale: Active-Parameter vs Total-Parameter Matching

Abdalrahman Wael · arXiv:2605.13769 · May 2026

A controlled tiny-scale pretraining study comparing dense and mixture-of-experts transformers under active-parameter and total-parameter matching.