Abstract: With its powerful visual-language alignment capability, CLIP performs well in zero-shot and few-shot learning tasks. However, we found in experiments that CLIP’s logits suffer from serious ...
Abstract: Multimodal sarcasm detection, aiming to uncover sarcastic sentiment behind multimodal data, has gained substantial attention in multimodal communities. Recent advancements in multimodal ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results