LewisM. x 2

bookRéférences 2

Scaling Expert Language Models with Unsupervised Domain Discovery

Large language models are typically trained densely: all parameters are updated with respect to all ...

LiM.ShiW.GururanganS.LewisM.AlthoffT.

Branch-train-merge: Embarrassingly parallel training of expert language models

We present Branch-Train-Merge (BTM), a communication-efficient algorithm for embarrassingly parallel...

LiM.GururanganS.LewisM.DettmersT.

Mots-clés associés

LiM.ShiW.GururanganS.AlthoffT.DettmersT.