DettmersT. x 1

bookRéférences 1

Branch-train-merge: Embarrassingly parallel training of expert language models

We present Branch-Train-Merge (BTM), a communication-efficient algorithm for embarrassingly parallel...

LiM.GururanganS.LewisM.DettmersT.

Mots-clés associés