LewisM. x 2
bookRéférences 2
Scaling Expert Language Models with Unsupervised Domain Discovery
Large language models are typically trained densely: all parameters are updated with respect to all ...
Branch-train-merge: Embarrassingly parallel training of expert language models
We present Branch-Train-Merge (BTM), a communication-efficient algorithm for embarrassingly parallel...