CVPR2021論文サマリ
CVPR2021論文サマリ
Cross-Modal Contrastive Learning for Text-to-Image Generation
by: Seitaro Shinagawa
GAN
Multi modal
Vision and language
Learning by Planning: Language-Guided Global Image Editing
by: Seitaro Shinagawa
Dataset
Vision and language
Scan2Cap: Context-Aware Dense Captioning in RGB-D Scans
by: Katsuyuki Nakamura
3D object detection
Multi modal
Vision and language
FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation
by: Seitaro Shinagawa
Vision and language
UP-DETR: Unsupervised Pre-Training for Object Detection With Transformers
by: 福原吉博 (Yoshihiro Fukuhara)
Object detection
Unsupervised Learning
Rethinking and Improving the Robustness of Image Style Transfer
by: 福原吉博 (Yoshihiro Fukuhara)
Style Transfer
Where and What? Examining Interpretable Disentangled Representations
by: 福原吉博 (Yoshihiro Fukuhara)
Disentanglement
GAN
Exemplar-Based Open-Set Panoptic Segmentation Network
by: Hiroaki Aizawa
Learning Continuous Image Representation With Local Implicit Image Function
by: Hiroaki Aizawa
Separating Skills and Concepts for Novel Visual Question Answering
by: Shintaro Yamamoto
Vision and language
Privacy-Preserving Image Features via Adversarial Affine Subspace Embeddings
by: 福原吉博 (Yoshihiro Fukuhara)
Privacy
Exploring Simple Siamese Representation Learning
by: 福原吉博 (Yoshihiro Fukuhara)
Representation learning
Can We Characterize Tasks Without Labels or Features?
by: Hiroaki Aizawa
SelfDoc: Self-Supervised Document Representation Learning
by: Shintaro Yamamoto
document recognition
Image Change Captioning by Learning From an Auxiliary Task
by: Shintaro Yamamoto
Vision and language
Multiresolution Knowledge Distillation for Anomaly Detection
by: Shunsuke Nakatsuka
Knowledge distillation
Anomaly detection
Deep Stable Learning for Out-of-Distribution Generalization
by: 福原吉博 (Yoshihiro Fukuhara)
Generalization
How Well Do Self-Supervised Models Transfer?
by: 福原吉博 (Yoshihiro Fukuhara)
Representation learning
Self supervised learning
LAFEAT: Piercing Through Adversarial Defenses With Latent Features
by: 福原吉博 (Yoshihiro Fukuhara)
Adversarial examples
Robustness
Learning Affinity-Aware Upsampling for Deep Image Matting
by: Masanori YANO
Image matting
Upsampling
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
by: 福原吉博 (Yoshihiro Fukuhara)
Attetion
Object detection
Transformer
Shape From Sky: Polarimetric Normal Recovery Under the Sky
by: Takahiro Kushida
3D
3D reconstruction
Depth estimation
A Sliced Wasserstein Loss for Neural Texture Synthesis
by: Takayuki Semitsu
Disentanglement
Domain adaptation
Fair Feature Distillation for Visual Recognition
by: 福原吉博 (Yoshihiro Fukuhara)
Knowledge distillation
Representation learning
Fairness
StyleMix: Separating Content and Style for Enhanced Data Augmentation
by: 福原吉博 (Yoshihiro Fukuhara)
Data Augmentation
Generative Classifiers as a Basis for Trustworthy Image Classification
by: Hiroaki Aizawa
Decoupled Dynamic Filter Networks
by: Masanori YANO
Depth estimation
Object detection
Recognition
Upsampling
Representation Learning via Global Temporal Alignment and Cycle-Consistency
by: 福原吉博 (Yoshihiro Fukuhara)
Representation learning
Video
Improving Calibration for Long-Tailed Recognition
by: Akihiro FUJII
Recognition
Robustness
long tailed
Open-Vocabulary Object Detection Using Captions
by: Shintaro Yamamoto
Object detection
Vision and language
Adaptive Image Transformer for One-Shot Object Detection
by: Shintaro Yamamoto
N-shot learning
Object detection
Primitive Representation Learning for Scene Text Recognition
by: Hirokatsu Kataoka
Recognition
Representation learning
HyperSeg: Patch-Wise Hypernetwork for Real-Time Semantic Segmentation
by: Shigemichi Matsuzaki
Semantic segmentation
Hypernetwork
Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes
by: 福原吉博 (Yoshihiro Fukuhara)
Representation learning
Neural Scene Representation
End-to-End Human Object Interaction Detection With HOI Transformer
by: Shintaro Yamamoto
Human Object Interaction
Look Before You Speak: Visually Contextualized Utterances
by: Shintaro Yamamoto
Video
Vision and language
Adversarial Robustness Across Representation Spaces
by: 福原吉博 (Yoshihiro Fukuhara)
Adversarial examples
Robustness
Image De-Raining via Continual Learning
by: Masanori YANO
Deraining
Learning method
Continual learning
View-Guided Point Cloud Completion
by: Naoya Chiba
3D
3D reconstruction
Depth estimation
Point cloud
Intelligent Carpet: Inferring 3D Human Pose From Tactile Signals
by: Hirokatsu Kataoka
3D
Recognition
Repetitive Activity Counting by Sight and Sound
by: Katsuyuki Nakamura
Action recognition
Multi modal
Video
Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization
by: Katsuhiro Muto
GAN
Relation-aware Instance Refinement for Weakly Supervised Visual Grounding
by: Seitaro Shinagawa
Vision and language
On Learning the Geodesic Path for Incremental Learning
by: Shunsuke Nakatsuka
Knowledge distillation
Continual learning
Seesaw Loss for Long-Tailed Instance Segmentation
by: Shunsuke Yoshizawa
Instance segmentation
Segmentation
Loss function
Temporal Action Segmentation From Timestamp Supervision
by: Shunsuke Kogure
Action recognition
Segmentation
Video
Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food
by: Shintaro Yamamoto
Dataset
Taskology: Utilizing Task Relations at Scale
by: Shintaro Yamamoto
Dynamic Region-Aware Convolution
by: Masanori YANO
Instance segmentation
Object detection
Recognition
Ensembling With Deep Generative Views
by: Akihiro FUJII
Adversarial examples
GAN
Robustness
ensemble
Physically-Aware Generative Network for 3D Shape Modeling
by: Eisuke Yamagata
3D
3D reconstruction
GAN
Exploiting Aliasing for Manga Restoration
by: Masanori YANO
Attetion
Super resolution
Manga restoration
SCF-Net: Learning Spatial Contextual Features for Large-Scale Point Cloud Segmentation
by: 山田亮佑 (Ryosuke Yamada)
3D
3D object detection
Mask Guided Matting via Progressive Refinement Network
by: Masanori YANO
Dataset
Image matting
Upsampling
Coarse-Fine Networks for Temporal Activity Detection in Videos
by: Katsuyuki Nakamura
Action recognition
Video