Brand
    cvpaper.challenge

CVPR2022論文サマリ

tag: vision-and-language

  • «
  • ‹
  • …
End-to-End Referring Video Object Segmentation With Multimodal Transformers
by: Ryuichi Nakahara
Segmentation Video Vision and language
Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation
by: Hirokatsu Kataoka
Dataset Representation learning Video Vision and language
TubeDETR: Spatio-Temporal Video Grounding With Transformers
by: Kazuki Omi
Multi modal Object detection Video Vision and language
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
by: Ryuichi Nakahara
Video Vision and language
Video-Text Representation Learning via Differentiable Weak Temporal Alignment
by: Ryuichi Nakahara
Video Vision and language
  • «
  • ‹
  • …
©2019 cvpaper.challenge