cvpaper.challenge
CVPR2022論文サマリ
tag: vision-and-language
«
‹
1
2
3
4
…
›
»
End-to-End Referring Video Object Segmentation With Multimodal Transformers
by: Ryuichi Nakahara
Segmentation
Video
Vision and language
Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation
by: Hirokatsu Kataoka
Dataset
Representation learning
Video
Vision and language
TubeDETR: Spatio-Temporal Video Grounding With Transformers
by: Kazuki Omi
Multi modal
Object detection
Video
Vision and language
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
by: Ryuichi Nakahara
Video
Vision and language
Video-Text Representation Learning via Differentiable Weak Temporal Alignment
by: Ryuichi Nakahara
Video
Vision and language
«
‹
1
2
3
4
…
›
»