SinhaK. x 1

bookRéférences 1

Masked language modeling and the distributional hypothesis: Order word matters pre-training for little

A possible explanation for the impressive performance of masked language model (MLM) pre-training is...

PineauJ.WilliamsA.SinhaK.JiaR.HupkesD.

Mots-clés associés