HuangP.Y. x 2

bookRéférences 2

MAViL: Masked Audio-Video Learners

We present Masked Audio-Video Learners (MAViL) to train audio-visual representations. Our approach l...

2022-12-15 00:00:00

XuH.HuangP.Y.SharmaV.RyaliC.FanH.LiY.

Cit: Curation in training for effective vision-language data

Large vision-language models are generally applicable to many downstream tasks, but come at an exorb...

HowesR.XuH.XieS.HuangP.Y.YuL.

Mots-clés associés