bookCit: Curation in training for effective vision-language data
Large vision-language models are generally applicable to many downstream tasks, but come at an exorb...
HowesR.XuH.XieS.HuangP.Y.YuL.