bookMAViL: Masked Audio-Video Learners
We present Masked Audio-Video Learners (MAViL) to train audio-visual representations. Our approach l...
2022-12-15 00:00:00
XuH.HuangP.Y.SharmaV.RyaliC.FanH.LiY.
bookCit: Curation in training for effective vision-language data
Large vision-language models are generally applicable to many downstream tasks, but come at an exorb...
HowesR.XuH.XieS.HuangP.Y.YuL.