SinghA. x 2

bookRéférences 2

Flava: A foundational language and vision alignment model

… The hateful memes challenge: Detecting hate speech in multimodal memes. Proceedings of ...

2026-01-20 00:00:00

SinghA.GoswamiV.HuR.

Human-adversarial visual question answering

Performance on the most commonly used Visual Question Answering dataset (VQA v2) is starting to appr...

ShengS.SinghA.GoswamiV.

Mots-clés associés