CVPR2022論文サマリ

Instance segmentation Segmentation Real-time

Few-Shot Incremental Learning for Label-to-Image Translation

by: Yoshi Truong

GAN N-shot learning

Active Learning for Open-Set Annotation

by: Anonymous

Active learning

Sparse Instance Activation for Real-Time Instance Segmentation

by: Yoshiki Kubotani

Evaluation-Oriented Knowledge Distillation for Deep Face Recognition

by: 鈴木共生

Knowledge distillation Recognition Face

Self-Supervised Video Transformer

by: 志田　遥飛

HyperStyle: StyleGAN Inversion With HyperNetworks for Real Image Editing

by: Shion Hinda

Optimal Correction Cost for Object Detection Evaluation

by: Masanori YANO

3D reconstruction Pose estimation SLAM NeRF

NICE-SLAM: Neural Implicit Scalable Encoding for SLAM

by: shoji sonoyama

Equivariant Point Cloud Analysis via Learning Orientations for Message Passing

by: Naoya Chiba

3D Disentanglement Point cloud

Syntax-Aware Network for Handwritten Mathematical Expression Recognition

by: Atsuki Osanai

Recognition Text Handwritten

Replacing Labeled Real-Image Datasets With Auto-Generated Contours

by: Anonymous

Dataset Representation learning Self supervised learning

Pushing the Performance Limit of Scene Text Recognizer Without Human Annotation

by: Atsuki Osanai

Domain adaptation Recognition Representation learning Self supervised learning Text Semi-supervised Learning

Hyperbolic Vision Transformers: Combining Improvements in Metric Learning

by: Nakano Shuhei

Dataset Instance segmentation text layout analysis

Towards End-to-End Unified Scene Text Detection and Layout Analysis

by: Atsuki Osanai

Learning Local Displacements for Point Cloud Completion

by: Naoya Chiba

3D 3D reconstruction Point cloud Semantic segmentation

CAFE: Learning To Condense Dataset by Aligning Features

by: Yuma Ochi

3D Point cloud Semantic segmentation

Surface Representation for Point Clouds

by: Naoya Chiba

POCO: Point Convolution for Surface Reconstruction

by: 古川遼

3D reconstruction Point cloud

Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation

by: Masanori YANO

Image completion Image outpainting Transformer

Dataset Distillation by Matching Training Trajectories

by: Yuma Ochi

Pose estimation Camera calibration

Camera Pose Estimation Using Implicit Distortion Models

by: shoji sonoyama

Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast

by: Anonymous

Segmentation Video Vision and language

End-to-End Referring Video Object Segmentation With Multimodal Transformers

by: Ryuichi Nakahara

Self-Supervised Arbitrary-Scale Point Clouds Upsampling via Implicit Neural Representation

by: Naoya Chiba

3D Point cloud Self supervised learning Super resolution

Masked Feature Prediction for Self-Supervised Visual Pre-Training

by: 志田　遥飛

Neural architecture search(NAS)

Searching the Deployable Convolution Neural Networks for GPUs

by: Yuma Ochi

Neural Point Light Fields

by: Masanori YANO

3D Point cloud NeRF

Vision Transformer With Deformable Attention

by: Kaikai Zhao

Attetion Instance segmentation Video

Playable Environments: Video Manipulation in Space and Time

by: Akihiro FUJII

3D GAN NeRF

Temporally Efficient Vision Transformer for Video Instance Segmentation

by: Akihiro FUJII

Recurrent Glimpse-Based Decoder for Detection With Transformer

by: Anonymous

Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning

by: kodai nakashima

TransRank: Self-Supervised Video Representation Learning via Ranking-Based Transformation Recognition

by: 志田　遥飛

3D Point cloud Self supervised learning Super resolution

Opening Up Open World Tracking

by: Masanori YANO

Dataset Object tracking

LAKe-Net: Topology-Aware Point Cloud Completion by Localizing Aligned Keypoints

by: Naoya Chiba

NeRFusion: Fusing Radiance Fields for Large-Scale Scene Reconstruction

by: 佐藤凜太郎

Person re-identification Self supervised learning

Large-Scale Pre-Training for Person Re-Identification With Noisy Labels

by: Hirokatsu Kataoka

Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation

by: Hirokatsu Kataoka

Dataset Representation learning Video Vision and language

ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes

by: Hirokatsu Kataoka

Dataset Instance segmentation

Proper Reuse of Image Classification Features Improves Object Detection

by: Anonymous

3D object detection Dataset Point cloud

The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift

by: Hirokatsu Kataoka

Dataset Object detection

DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection

by: Hirokatsu Kataoka

Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability

by: 田所龍

Knowledge distillation Representation learning

RegNeRF: Regularizing Neural Radiance Fields for View Synthesis From Sparse Inputs

by: Hirokatsu Kataoka

360MonoDepth: High-Resolution 360° Monocular Depth Estimation

by: Ryunosuke Isikawa

3D 3D object detection

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

by: Anonymous

3D Point cloud Representation learning Self supervised learning

SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation

by: Yuma Ochi

3D Dataset

Density-Preserving Deep Point Cloud Compression

by: Naoya Chiba

StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions

by: Keiichi Sawada

Domain adaptation Representation learning

What Makes Transfer Learning Work for Medical Images: Feature Reuse & Other Factors

by: 奥西泰地

ICON: Implicit Clothed Humans Obtained From Normals

by: Ryosuke Yamada

QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection

by: Anonymous

Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields

by: Hirokatsu Kataoka

Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC

by: 鈴木共生

Robustness Face

Unsupervised Image-to-Image Translation With Generative Prior

by: Anonymous

3D Object detection Point cloud Pose estimation Segmentation Semantic segmentation

SoftGroup for 3D Instance Segmentation on Point Clouds

by: Naoya Chiba

Reflash Dropout in Image Super-Resolution

by: Masanori YANO

Super resolution Dropout

TubeDETR: Spatio-Temporal Video Grounding With Transformers

by: Kazuki Omi

Multi modal Object detection Video Vision and language

DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation

by: Yoshi Truong

Diffusion

Learning a Structured Latent Space for Unsupervised Point Cloud Completion

by: Naoya Chiba

3D Disentanglement Point cloud Representation learning Self supervised learning

Detecting Deepfakes With Self-Blended Images

by: 岡本大和

deepfake

Affine Medical Image Registration With Coarse-To-Fine Vision Transformer

by: cho

3D Attetion

Enhancing Adversarial Training With Second-Order Statistics of Weights

by: s ishikawa

Robustness

Everything at Once – Multi-Modal Fusion Transformer for Video Retrieval

by: Chihiro Nakatani（中谷千洋）

Multi modal Video

SimMIM: A Simple Framework for Masked Image Modeling

by: Ryosuke Yamada

Action recognition Recognition Video

Dual-AI: Dual-Path Actor Interaction Learning for Group Activity Recognition

by: Chihiro Nakatani（中谷千洋）

X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval

by: Ryuichi Nakahara

Video-Text Representation Learning via Differentiable Weak Temporal Alignment

by: Ryuichi Nakahara

3D 3D reconstruction Disentanglement Mesh

Neural Template: Topology-Aware Reconstruction and Disentangled Generation of 3D Meshes

by: 古川遼

Dynamic 3D Gaze From Afar: Deep Gaze Estimation From Temporal Eye-Head-Body Coordination

by: Anonymous

3D Action recognition Dataset Pose estimation

SimVQA: Exploring Simulated Environments for Visual Question Answering

by: QIUYUE

Contextual Debiasing for Visual Recognition With Causal Mechanisms

by: QIUYUE

Attetion Manipulation Detection

ObjectFormer for Image Manipulation Detection and Localization

by: 岡本大和（LINE Computer Vision Lab）

Proactive Image Manipulation Detection

by: 岡本大和（LINE Computer Vision Lab）

Adversarial examples Attetion Manipulation Detection

Self-Supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection

by: 岡本大和（LINE Computer Vision Lab）

Adversarial examples Fake

Aesthetic Text Logo Synthesis via Content-Aware Layout Inferring

by: Yoshi Truong

Dataset GAN Multi modal Vision and language

TableFormer: Table Structure Understanding With Transformers

by: 岡本大和（LINE Computer Vision Lab）

Object detection document analysis

Training Object Detectors From Scratch: An Empirical Study in the Era of Vision Transformer

by: Anonymous

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

by: cho

Attetion

MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering

by: Ryuichi Nakahara

3D Point cloud Pose estimation Self supervised learning

Lepard: Learning Partial Point Cloud Matching in Rigid and Deformable Scenes

by: Naoya Chiba

CLIPstyler: Image Style Transfer With a Single Text Condition

by: Takeru Endo

Multi modal Vision and language

UTC: A Unified Transformer With Inter-Task Contrastive Learning for Visual Dialog

by: Takeru Endo

Multi modal Vision and language

Confidence Propagation Cluster: Unleash Full Potential of Object Detectors

by: Anonymous

GAN Super resolution Face restoration

Rethinking Deep Face Restoration

by: Masanori YANO

Spatial Commonsense Graph for Object Localisation in Partial Scenes

by: 綱島秀樹

3D 3D Object Localization GNN

DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

by: Anonymous

3D object detection Point cloud

OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion

by: Norikatsu Sumi

3D 3D reconstruction Point cloud Self supervised learning

Reconstructing Surfaces for Sparse Point Clouds With On-Surface Priors

by: Naoya Chiba

Recurrent Dynamic Embedding for Video Object Segmentation

by: 上田　樹

Segmentation Video

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields

by: Norikatsu Sumi

Disentanglement Neural radiance fields (NeRF) Vision and language

Block-NeRF: Scalable Large Scene Neural View Synthesis

by: Hirokatsu Kataoka

Rethinking Minimal Sufficient Representation in Contrastive Learning

by: take

Learning With Neighbor Consistency for Noisy Labels

by: take

Dataset Domain adaptation

Towards Driving-Oriented Metric for Lane Detection Models

by: Yuma Ochi

Surface Reconstruction From Point Clouds by Learning Predictive Context Priors

by: Naoya Chiba

3D Point cloud Self supervised learning

Point-NeRF: Point-Based Neural Radiance Fields

by: 佐藤凜太郎

3D 3D reconstruction Depth estimation Neural radiance fields (NeRF) Point cloud

Neural 3D Scene Reconstruction With the Manhattan-World Assumption

by: 佐藤凜太郎

3D 3D reconstruction Depth estimation Neural radiance fields (NeRF) Segmentation Semantic segmentation

Object-Region Video Transformers

by: Shuhei M. Yoshida

Action recognition Object detection Video

Correlation Verification for Image Retrieval

by: Masanori YANO

Image retrieval

Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

by: QIUYUE

M3L: Language-Based Video Editing via Multi-Modal Multi-Level Transformers

by: QIUYUE

3D Disentanglement Point cloud Self supervised learning

Forward Propagation, Backward Regression, and Pose Association for Hand Tracking in the Wild

by: Masanori YANO

Dataset Object tracking

IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment

by: Naoya Chiba

Self-Supervised Models Are Continual Learners

by: Shida Haruhi

Attetion Knowledge distillation Object detection

TransGeo: Transformer Is All You Need for Cross-View Image Geo-Localization

by: Anonymous

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

by: QIUYUE

3D 3D reconstruction Dataset Multi modal Neural radiance fields (NeRF)

A Probabilistic Graphical Model Based on Neural-Symbolic Reasoning for Visual Relationship Detection

by: QIUYUE

NeRF in the Dark: High Dynamic Range View Synthesis From Noisy Raw Images

by: Hirokatsu Kataoka

Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields

by: Hirokatsu Kataoka

On the Integration of Self-Attention and Convolution

by: cho

Attetion

Towards Efficient and Scalable Sharpness-Aware Minimization

by: S Ishikawa

Attetion optimization

SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning

by: Tomoya Nitta

Task-Specific Inconsistency Alignment for Domain Adaptive Object Detection

by: Kazuki Omi

Attetion Instance segmentation Segmentation Semantic segmentation

Instance-Dependent Label-Noise Learning With Manifold-Regularized Transition Matrix Estimation

by: Shuhei M. Yoshida

Robustness Noisy labels

Masked-Attention Mask Transformer for Universal Image Segmentation

by: Takehiro Matsuda

Multi-Instance Point Cloud Registration by Efficient Correspondence Clustering

by: Naoya Chiba

3D Object detection Point cloud Pose estimation

Accelerating DETR Convergence via Semantic-Aligned Matching

by: Masanori YANO

Action recognition Attetion N-shot learning Recognition Video

Spatio-Temporal Relation Modeling for Few-Shot Action Recognition

by: Shuhei M. Yoshida

Hybrid Relation Guided Set Matching for Few-Shot Action Recognition

by: Shuhei M. Yoshida

Action recognition N-shot learning Recognition Video

ImplicitAtlas: Learning Deformable Shape Templates in Medical Imaging

by: 古川遼

Representation learning Generative model

Diffusion Autoencoders: Toward a Meaningful and Decodable Representation

by: 朝岡忠

Measuring Compositional Consistency for Video Question Answering

by: QIUYUE

Action recognition Dataset Video Vision and language

SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering

by: QIUYUE

Urban Radiance Fields

by: Hirokatsu Kataoka

Dataset N-shot learning Synthetic Data

Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data

by: Tatsuya Onishi

Temporal Alignment Networks for Long-Term Video

by: Jun Kimata

Action recognition Dataset Multi modal Video Vision and language

Few-Shot Font Generation by Learning Fine-Grained Local Styles

by: Daichi Haraguchi

Font generation Image translation

REGTR: End-to-End Point Cloud Correspondences With Transformers

by: Naoya Chiba

MogFace: Towards a Deeper Appreciation on Face Detection

by: 鈴木共生

Object detection Face

InstaFormer: Instance-Aware Image-to-Image Translation With Transformer

by: Anonymous

Semantic segmentation Weakly-supervised

Scaling Vision Transformers

by: Sora Takashima （高島空良）

Attetion Recognition

Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation

by: 西村和也（九大）

C2AM: Contrastive Learning of Class-Agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation

by: 西村和也（九大）

Semantic segmentation Weakly-supervised learning

Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement

by: 西村和也　（九大）

Instance segmentation Weakly supervised learning

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

by: Anonymous

Instance segmentation Self supervised learning

FreeSOLO: Learning To Segment Objects Without Annotations

by: Masanori YANO

A-ViT: Adaptive Tokens for Efficient Vision Transformer

by: Anonymous

3D Multi modal Point cloud

Multimodal Colored Point Cloud to Image Alignment

by: Naoya Chiba

Self-Supervised Dense Consistency Regularization for Image-to-Image Translation

by: Anonymous

Object detection Recognition Segmentation

A ConvNet for the 2020s

by: Anonymous

Global Tracking Transformers

by: Masanori YANO

Object tracking Transformer

Style-ERD: Responsive and Coherent Online Motion Style Transfer

by: Kosuke Fukazawa

3D Motion Synthesis

Ithaca365: Dataset and Driving Perception Under Repeated and Challenging Weather Conditions

by: Yuma Ochi

3D Dataset Depth estimation Instance segmentation Video

FENeRF: Face Editing in Neural Radiance Fields

by: Hirokatsu Kataoka

3D reconstruction GAN Neural radiance fields (NeRF)

HumanNeRF: Efficiently Generated Human Radiance Field From Sparse Inputs

by: Hirokatsu Kataoka

3D reconstruction Neural radiance fields (NeRF)

NAN: Noise-Aware NeRFs for Burst-Denoising

by: Hirokatsu Kataoka

Deblur-NeRF: Neural Radiance Fields From Blurry Images

by: Hirokatsu Kataoka

Neural radiance fields (NeRF) Semantic segmentation

Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation

by: Hirokatsu Kataoka

StylizedNeRF: Consistent 3D Scene Stylization As Stylized NeRF via 2D-3D Mutual Learning

by: Hirokatsu Kataoka

NeRF-Editing: Geometry Editing of Neural Radiance Fields

by: Hirokatsu Kataoka

Adversarial examples Dataset Recognition Robustness

Does Robustness on ImageNet Transfer to Downstream Tasks?

by: Hirokatsu Kataoka

Do Explanations Explain? Model Knows Best

by: Hirokatsu Kataoka

Robustness

SphericGAN: Semi-Supervised Hyper-Spherical Generative Adversarial Networks for Fine-Grained Image Synthesis

by: Anonymous

Mega-NERF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs

by: Anonymous

3D Point cloud Pose estimation Self supervised learning

RCP: Recurrent Closest Point for Point Cloud

by: Naoya Chiba

Simple but Effective: CLIP Embeddings for Embodied AI

by: 朝岡忠

Representation learning Embodied AI

MetaFormer Is Actually What You Need for Vision

by: Anonymous

Instance segmentation Object detection Recognition

Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization

by: Kawano Yasufumi

Semantic segmentation Self supervised learning

Label Matching Semi-Supervised Object Detection

by: Masanori YANO

Object detection Semi-supervised learning

CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision

by: Shunsuke Yoshizawa

Dataset Segmentation Semantic segmentation Self supervised learning

Towards Better Understanding Attribution Methods

by: Yuya Yoshikawa

Recognition Explainability

AutoRF: Learning 3D Object Radiance Fields From Single View Observations

by: Norikatsu Sumi

3D 3D object detection Point cloud

Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

by: shoji sonoyama

Pose estimation SLAM

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection From Point Clouds

by: Naoya Chiba

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

by: Masanori YANO

Dataset Object tracking

Learning From Pixel-Level Noisy Label: A New Perspective for Light Field Saliency Detection

by: Yusuke Saito

Saliency detection Light field

Ego4D: Around the World in 3,000 Hours of Egocentric Video

by: Masanori YANO

Action recognition Dataset Multi modal Video Vision and language

A Versatile Multi-View Framework for LiDAR-Based 3D Object Detection With Guidance From Panoptic Segmentation

by: Anonymous

3D 3D object detection Multi modal Point cloud

Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior

by: 志田　遥飛

generation

RecDis-SNN: Rectifying Membrane Potential Distribution for Directly Training Spiking Neural Networks

by: Yuma Ochi

SNN

Plenoxels: Radiance Fields Without Neural Networks

by: 佐藤凜太郎

3D Point cloud Self supervised learning

Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds

by: Naoya Chiba

Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning

by: 志田遥飛

Optical flow Video Video prediction Video frame interpolation

Generalizing Gaze Estimation With Rotation Consistency

by: Anonymous

Pose estimation

Comparing Correspondences: Video Prediction With Correspondence-Wise Losses

by: Masanori YANO

Dist-PU: Positive-Unlabeled Learning From a Label Distribution Perspective

by: Anonymous

weakly supervised learning

Depth-Supervised NeRF: Fewer Views and Faster Training for Free

by: Anonymous

3D 3D reconstruction Depth estimation Neural radiance fields (NeRF)

Not All Tokens Are Equal: Human-Centric Visual Analysis via Token Clustering Transformer

by: Anonymous

Pose estimation

Improving Neural Implicit Surfaces Geometry With Patch Warping

by: 佐藤凜太郎

InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering

by: Anonymous

Maintaining Reasoning Consistency in Compositional Visual Question Answering

by: QIUYUE

3D Video Vision and language

Generating Diverse and Natural 3D Human Motions From Text

by: QIUYUE

Text-to-Image Synthesis Based on Object-Guided Joint-Decoding Transformer

by: Ryo Muto

GAN Multi modal Object detection Vision and language

Human Mesh Recovery From Multiple Shots

by: 上田　樹

3D reconstruction Pose estimation Video

Layered Depth Refinement With Mask Guidance

by: Yuhi Matsuo

3D Object detection Point cloud Segmentation Semantic segmentation

StyleGAN-V: A Continuous Video Generator With the Price, Image Quality and Perks of StyleGAN2

by: Anonymous

3D GAN Video

A Unified Query-Based Paradigm for Point Cloud Understanding

by: Naoya Chiba

MPViT: Multi-Path Vision Transformer for Dense Prediction

by: Anonymous

Object detection Recognition Segmentation

HyperInverter: Improving StyleGAN Inversion via Hypernetwork

by: 加藤義道

Recognition Self supervised learning Video Hand-object interaction

Human Hands As Probes for Interactive Object Understanding

by: Shuhei M. Yoshida

Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale

by: QIUYUE

Shunted Self-Attention via Multi-Scale Token Aggregation

by: Anonymous

Prompt Distribution Learning

by: QIUYUE

Semantic segmentation Video Face swapping Forgery detection Image authenticity

Few-Shot Head Swapping in the Wild

by: 加藤義道

Volumetric Bundle Adjustment for Online Photorealistic Scene Capture

by: Anonymous

Recognition Self supervised learning Explainability

MotionAug: Augmentation With Physical Correction for Human Motion Prediction

by: Takahiro Maeda

Human motion

A Framework for Learning Ante-Hoc Explainable Models via Concepts

by: Yuya Yoshikawa

FAM: Visual Explanations for the Feature Representations From Deep Convolutional Networks

by: Tomoya Nitta

Person re-identification Representation learning Self supervised learning

Geometric Transformer for Fast and Robust Point Cloud Registration

by: Naoya Chiba

Structured Local Radiance Fields for Human Avatar Modeling

by: Keiichi Sawada

Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation

by: Anonymous

3D 3D object detection Dataset Point cloud

Point Cloud Pre-Training With Natural 3D Structures

by: Masanori YANO

Advancing High-Resolution Video-Language Representation With Large-Scale Video Transcriptions

by: Tomoya Nitta

Dataset Representation learning Video Vision and language

ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

by: Ryuichi nakahara

HumanNeRF: Free-Viewpoint Rendering of Moving People From Monocular Video

by: Hirokatsu Kataoka

Neural radiance fields (NeRF) Pose estimation

MiniViT: Compressing Vision Transformers With Weight Multiplexing

by: Anonymous

Knowledge distillation Object detection Recognition

Omnivore: A Single Model for Many Visual Modalities

by: Hirokatsu Kataoka

3D Action recognition Dataset Recognition Semantic segmentation

Zero-Shot Text-Guided Object Generation With Dream Fields

by: cho

3D Neural radiance fields (NeRF)

Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning

by: Hirokatsu Kataoka

3D Point cloud Segmentation Semantic segmentation

SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation

by: Naoya Chiba

BEHAVE: Dataset and Method for Tracking Human Object Interactions

by: Hirokatsu Kataoka

Pose estimation

Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation

by: QIUYUE

Attention Vision and language

Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding

by: QIUYUE

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

by: QIUYUE

Object detection Self supervised learning Vision and language

Primitive3D: 3D Object Dataset Synthesis From Randomly Assembled Primitives

by: Hirokatsu Kataoka

3D Dataset Semantic segmentation

Multi-View Transformer for 3D Visual Grounding

by: QIUYUE

3D Vision and language

Deep Hierarchical Semantic Segmentation

by: Anonymous

3D 3D object detection 3D reconstruction

Pre-Train, Self-Train, Distill: A Simple Recipe for Supersizing 3D Reconstruction

by: Ryunosuke Ishikawa

Multi-Modal Dynamic Graph Transformer for Visual Grounding

by: QIUYUE

Vision-Language Pre-Training for Boosting Scene Text Detectors

by: Hirokatsu Kataoka

Visual Abductive Reasoning

by: QIUYUE

3D object detection Point cloud

Fast Point Transformer

by: Hirokatsu Kataoka

Learning From All Vehicles

by: Hirokatsu Kataoka

self-driving cars

Query and Attention Augmentation for Knowledge-Based Explainable Reasoning

by: QIUYUE

HairCLIP: Design Your Hair by Text and Reference Image

by: cho

Mobile-Former: Bridging MobileNet and Transformer

by: Ryo Takahashi

Transformer-Empowered Multi-Scale Contextual Matching and Aggregation for Multi-Contrast MRI Super-Resolution

by: Ryuichi Nakahara

Medical Image

Hierarchical Self-Supervised Representation Learning for Movie Understanding

by: 志田遥飛

3D reconstruction Human mesh recovery

Occluded Human Mesh Recovery

by: Masanori YANO

EfficientNeRF Efficient Neural Radiance Fields

by: Anonymous

3D 3D reconstruction Neural radiance fields (NeRF) N-shot learning

TeachAugment: Data Augmentation Optimization Using Teacher Knowledge

by: Anonymous

Adversarial examples Data Augmentation

REX: Reasoning-Aware and Grounded Explanation

by: QIUYUE

VisualHow: Multimodal Problem Solving

by: QIUYUE

Representation learning Vision and language

Multi-Modal Alignment Using Representation Codebook

by: QIUYUE

Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion

by: Takahiro Maeda

Trajectory prediction

MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound

by: QIUYUE

High-Resolution Image Harmonization via Collaborative Dual Transformations

by: Shoma Iwai

Image Harmonization

NOC-REK: Novel Object Captioning With Retrieved Vocabulary From External Knowledge

by: QIUYUE

Object detection Recognition Video

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

by: Anonymous

Show, Deconfound and Tell: Image Captioning With Causal Inference

by: QIUYUE

Adversarial examples Dataset Representation learning Robustness

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures

by: Satoki I

AdaMixer: A Fast-Converging Query-Based Object Detector

by: Anonymous

How Much More Data Do I Need? Estimating Requirements for Downstream Tasks

by: cho

Action recognition Video Action anticipation

Future Transformer for Long-Term Action Anticipation

by: Yasufumi Kawano

Real-Time Hyperspectral Imaging in Hardware via Trained Metasurface Encoders

by: Hiroki Okawa

Dataset Semantic segmentation Hyperspectral

A Hybrid Egocentric Activity Anticipation Framework via Memory-Augmented Recurrent and One-Shot Representation Forecasting

by: Yasufumi Kawano

Action recognition Video Action anticipation

Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations

by: Kazuhito Sato

3D 3D reconstruction Neural radiance fields (NeRF) Transformer

DATA: Domain-Aware and Task-Aware Self-Supervised Learning

by: 志田遥飛

Neural architecture search(NAS) Self supervised learning

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding

by: Naoya Chiba

3D Multi modal N-shot learning Point cloud Representation learning Self supervised learning

Smooth-Swap: A Simple Enhancement for Face-Swapping With Smoothness

by: 加藤義道

Self supervised learning Face swapping

Glass Segmentation Using Intensity and Spectral Polarization Cues

by: Teppei Kurita

Dataset Segmentation

Fisher Information Guidance for Learned Time-of-Flight Imaging

by: Teppei Kurita

Semantic segmentation Vision and language

CRIS: CLIP-Driven Referring Image Segmentation

by: Teppei Kurita

Pixel Screening Based Intermediate Correction for Blind Deblurring

by: Teppei Kurita

Deblur

Infrared Invisible Clothing: Hiding From Infrared Detectors at Multiple Angles in Real World

by: Teppei Kurita

Adversarial Attacks

Context-Aware Video Reconstruction for Rolling Shutter Cameras

by: Teppei Kurita

Video Rolling Shutter

TransforMatcher: Match-to-Match Attention for Semantic Correspondence

by: Teppei Kurita

Image Matching

Toward Fast, Flexible, and Robust Low-Light Image Enhancement

by: Teppei Kurita

Image Enhancement

DR.VIC: Decomposition and Reasoning for Video Individual Counting

by: 画像ベース

Person re-identification Video

Improving Visual Grounding With Visual-Linguistic Verification and Iterative Reasoning

by: QIUYUE

3D 3D reconstruction Self supervised learning

Neural Shape Mating: Self-Supervised Object Assembly With Adversarial Shape Priors

by: QIUYUE

Align and Prompt: Video-and-Language Pre-Training With Entity Prompts

by: QIUYUE

Self supervised learning Video Vision and language

ADAPT: Vision-Language Navigation With Modality-Aligned Action Prompts

by: Yanjun SUN

Multi modal Vision and language

EnvEdit: Environment Editing for Vision-and-Language Navigation

by: Yanjun SUN

Multi modal Vision and language Data augmentation

CHEX: CHannel EXploration for CNN Model Compression

by: Ryo Takahashi

Knowledge distillation Recognition

Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning

by: 鈴木共生

Attention Face

ZZ-Net: A Universal Rotation Equivariant Architecture for 2D Point Clouds

by: Naoya Chiba

Point cloud Pose estimation

Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data

by: 志田遥飛

Self supervised learning automaton

Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects

by: Masanori YANO

3D 3D reconstruction Representation learning Unsupervised learning

Region-Aware Face Swapping

by: 加藤義道

GAN Face Swapping

Neural 3D Video Synthesis From Multi-View Video

by: Anonymous

3D 3D reconstruction Neural radiance fields (NeRF) N-shot learning Video

Direct Voxel Grid Optimization: Super-Fast Convergence for Radiance Fields Reconstruction

by: Anonymous

3D 3D reconstruction Neural radiance fields (NeRF) N-shot learning

Dancing Under the Stars: Video Denoising in Starlight

by: Teppei Kurita

Video Denoise

Controllable Dynamic Multi-Task Architectures

by: Kazuki Omi

Neural architecture search(NAS) Multi-Task learning

RegionCLIP: Region-Based Language-Image Pretraining

by: cho

Towards Robust Adaptive Object Detection Under Noisy Annotations

by: Ryota Hashiguchi

Knowledge distillation few-shot class incremental learning

Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches

by: yasud

GenDR: A Generalized Differentiable Renderer

by: Norikatsu Sumi

3D reconstruction Differentiable renderer

Estimating Egocentric 3D Human Pose in the Wild With External Weak Supervision

by: Norikatsu Sumi

3D Pose estimation

I M Avatar: Implicit Morphable Head Avatars From Videos

by: 山田亮佑 (Ryosuke Yamada)

Adaptive Early-Learning Correction for Segmentation From Noisy Annotations

by: Ryota Hashiguchi

Semantic segmentation Semi-supervised learning Pseudo-labeling

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

by: Tatsuya Onishi

PoseTrack21: A Dataset for Person Search, Multi-Object Tracking and Multi-Person Pose Tracking

by: Masanori YANO

Dataset Person re-identification Pose estimation Object tracking Pose tracking Person search

Clothes-Changing Person Re-Identification With RGB Modality Only

by: Masanori YANO

Dataset Person re-identification

Masked Autoencoders Are Scalable Vision Learners

by: take

Representation learning Video

Unsupervised Pre-Training for Temporal Action Localization Tasks

by: Ryota Hashiguchi

SC2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration

by: Naoya Chiba

Disentanglement GAN Video Face Swapping

High-Resolution Face Swapping via Latent Semantics Disentanglement

by: 加藤義道

OpenTAL: Towards Open Set Temporal Action Localization

by: Ryota Hashiguchi

Video Temporal Action Localization

General Facial Representation Learning in a Visual-Linguistic Manner

by: 鈴木共生

N-shot learning Vision and language Face

3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection

by: Anonymous

3D 3D object detection Adversarial examples Dataset

Learning To Refactor Action and Co-Occurrence Features for Temporal Action Localization

by: Ryota Hashiguchi

Video Temporal Action Localization

ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation

by: Naoya Chiba

3D Adversarial examples Point cloud Pose estimation

XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding

by: Atsuki Osanai

Multi modal Robustness

SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition

by: Atsuki Osanai

Text Spotting

Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding

by: Masanori YANO

Action recognition Dataset Pose estimation Video Vision and language Video grounding

Knowledge Mining With Scene Text for Fine-Grained Recognition

by: Atsuki Osanai

Multi modal Recognition

VRDFormer: End-to-End Video Visual Relation Detection With Transformers

by: Shuhei M. Yoshida

Attention Recognition Video Visual relation detection

Towards Weakly-Supervised Text Spotting Using a Multi-Task Transformer

by: Atsuki Osanai

Text Spotting Weakly Supervised Learning

DETReg: Unsupervised Pretraining With Region Priors for Object Detection

by: Atsuki Osanai

Object detection Unsupervised Learning

Compositional Temporal Grounding With Structured Variational Cross-Graph Correspondence Learning

by: Shuhei M. Yoshida

Video Vision and language Temporal grounding

Interactron: Embodied Adaptive Object Detection

by: 朝岡忠

Meta learning Object detection Self supervised learning

Improving Video Model Transfer With Dynamic Representation Learning

by: Shuhei M. Yoshida

Action recognition Representation learning Video

MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions

by: QIUYUE

Dataset Video Vision and language

ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

by: QIUYUE

Depth estimation Instance segmentation Semantic segmentation

Texture-Based Error Analysis for Image Super-Resolution

by: Shoma Iwai

Super resolution

CLIP-Forge: Towards Zero-Shot Text-To-Shape Generation

by: cho

3D clip

PanopticDepth: A Unified Framework for Depth-Aware Panoptic Segmentation

by: Yasufumi Kawano

Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients

by: Naoya Chiba

3D Adversarial examples Point cloud

Moving Window Regression: A Novel Approach to Ordinal Regression

by: 鈴木共生

Face Age estimation

Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy

by: Hiroki Nakamura

Recognition Vision and language Zero-shot learning Attribute recognition

Disentangling Visual Embeddings for Attributes and Objects

by: Shuhei M. Yoshida

End-to-End Generative Pretraining for Multimodal Video Captioning

by: Tomoya Nitta

Multi modal Video Vision and language

Real-Time Object Detection for Streaming Perception

by: 朝岡忠

Object detection Streaming perception

Scene Graph Expansion for Semantics-Guided Image Outpainting

by: QIUYUE

Domain adaptation Object tracking

Unsupervised Domain Adaptation for Nighttime Aerial Tracking

by: Anonymous

FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment

by: Hirokatsu Kataoka

Action recognition Dataset

Task Adaptive Parameter Sharing for Multi-Task Learning

by: Kazuki Omi

Object detection Recognition Multi-Task Learning

Fine-Grained Temporal Contrastive Learning for Weakly-Supervised Temporal Action Localization

by: Jun Kimata

Action recognition Adversarial examples Video

CLIP-Event: Connecting Text and Images With Event Structures

by: QIUYUE

Dataset Self supervised learning Vision and language

AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation

by: Hirokatsu Kataoka

3D Representation learning

Continuous Scene Representations for Embodied AI

by: Yanjun SUN

Dynamic Sparse R-CNN

by: Hirokatsu Kataoka

Drop the GAN: In Defense of Patches Nearest Neighbors As Single Image Generative Models

by: Taiki Sugiura

Video Frame Interpolation Transformer

by: Kensho Hara

Video Frame Interpolation With Transformer

by: Kensho Hara

GAN Knowledge distillation Vision and language Face

Failure Modes of Domain Generalization Algorithms

by: Fumiharu Suzuki

Domain adaptation

AnyFace: Free-Style Text-To-Face Synthesis and Manipulation

by: 鈴木共生

Self-Supervised Keypoint Discovery in Behavioral Videos

by: 志田遥飛

Attention Object tracking Memory network Transformer

MeMOT: Multi-Object Tracking With Memory

by: Masanori YANO

A Study on the Distribution of Social Biases in Self-Supervised Learning Visual Models

by: 志田遥飛

Self supervised learning Transparency Fairness Accountability Privacy & Ethics in Vision

Dynamic Scene Graph Generation via Anticipatory Pre-Training

by: Yoshiki Nagasaki

Video Scene Graph

SGTR: End-to-End Scene Graph Generation With Transformer

by: Yoshiki Nagasaki

Attention Scene Graph Generation

Pointly-Supervised Instance Segmentation

by: 山田亮佑 (Ryosuke Yamada)

Instance segmentation

Backdoor Attacks on Self-Supervised Learning

by: 山田亮佑 (Ryosuke Yamada)

3D Adversarial examples Point cloud

Shape-Invariant 3D Adversarial Point Clouds

by: Naoya Chiba

Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs With Language Structures via Dependency Relationships

by: Seitaro Shinagawa

Attention Video Vision and language Scene graph generation

Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs

by: Shuhei M. Yoshida

Dual-Path Image Inpainting With Auxiliary GAN Inversion

by: Taiki Sugiura

3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds

by: QIUYUE

3D Vision and language

Reinforced Structured State-Evolution for Vision-Language Navigation

by: Yanjun SUN

WebQA: Multihop and Multimodal QA

by: QIUYUE

Action recognition Adversarial examples Object detection Self supervised learning Video

Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision

by: Jun Kimata

Contrastive Test-Time Adaptation

by: Hirokatsu Kataoka

Domain adaptation Representation learning Self supervised learning

EfficientNeRF Efficient Neural Radiance Fields

by: Anonymous

3D 3D reconstruction Neural radiance fields (NeRF) N-shot learning

PointCLIP: Point Cloud Understanding by CLIP

by: cho

3D Point cloud

Point-Level Region Contrast for Object Detection Pre-Training

by: 山田亮佑 (Ryosuke Yamada)

Object detection Self supervised learning

Target-Aware Dual Adversarial Learning and a Multi-Scenario Multi-Modality Benchmark To Fuse Infrared and Visible for Object Detection

by: hayamizu ryo

Multi modal

Occlusion-Robust Face Alignment Using a Viewpoint-Invariant Hierarchical Network Architecture

by: 鈴木共生

Pose estimation Face

Cross-Domain Few-Shot Learning With Task-Specific Adapters

by: Kazuki Omi

Knowledge distillation Meta learning Few-shot learning

Cross-Modal Perceptionist: Can Face Geometry Be Gleaned From Voices?

by: Anonymous

3D Dataset Multi modal

DESTR: Object Detection With Split Transformer

by: Masanori YANO

Self supervised learning Generation model

SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization

by: 志田遥飛

Point Cloud Color Constancy

by: Naoya Chiba

3D Dataset Point cloud

Complex Video Action Reasoning via Learnable Markov Logic Network

by: Shuhei M. Yoshida

Action recognition Recognition Logical inference

HDR-NeRF: High Dynamic Range Neural Radiance Fields

by: Tomohiro Hayase

Attention Object detection

OW-DETR: Open-World Detection Transformer

by: 朝岡忠

TrackFormer: Multi-Object Tracking With Transformers

by: 近藤佑樹(Kondo, Yuki)

Attention Instance segmentation Object detection Multi-object tracking

Beyond Fixation: Dynamic Window Visual Transformer

by: Anonymous

Learning Program Representations for Food Images and Cooking Recipes

by: QIUYUE

Hierarchical Modular Network for Video Captioning

by: QIUYUE

Self supervised learning model pruning

Fire Together Wire Together: A Dynamic Pruning Approach With Self-Supervised Mask Prediction

by: 志田遥飛

VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning

by: QIUYUE

semi-supervised learning medical image recognition

LAR-SR: A Local Autoregressive Model for Image Super-Resolution

by: Shoma Iwai

Super resolution

ACPL: Anti-Curriculum Pseudo-Labelling for Semi-Supervised Medical Image Classification

by: 西村和也（九大）

A Simple Data Mixing Prior for Improving Self-Supervised Learning

by: 志田遥飛

Self supervised learning Data Mixing ViT

MixFormer: Mixing Features Across Windows and Dimensions

by: Anonymous

Semantic segmentation weakly-supervised learning

Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation

by: 西村和也（九大）

VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks

by: Kazuki Omi

Multi modal Video Vision and language Multi-Task learning

Cross-Architecture Self-Supervised Video Representation Learning

by: 志田遥飛

Representation learning Self supervised learning Video

Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling

by: Naoya Chiba

3D 3D reconstruction Point cloud Super resolution

CLRNet: Cross Layer Refinement Network for Lane Detection

by: Masanori YANO

Segmentation Lane detection

Proto2Proto: Can You Recognize the Car, the Way I Do?

by: Yoitsu Takahashi

Knowledge distillation interpretability Prototypical method

Simple Multi-Dataset Detection

by: Kazuki Omi

Object detection Multi-Domain learning

KNN Local Attention for Image Restoration

by: Shoma Iwai

Attention Image Restoration

Surpassing the Human Accuracy: Detecting Gallbladder Cancer From USG Images With Curriculum Learning

by: Ryuichi Nakahara

Self supervised learning Navigation

SelfD: Self-Learning Large-Scale Driving Policies From the Web

by: 朝岡忠

Multi-Frame Self-Supervised Depth With Transformers

by: Yasufumi Kawano

Attention Depth estimation Self supervised learning

RigNeRF: Fully Controllable Neural 3D Portraits

by: 鈴木共生

Neural radiance fields (NeRF) Face

What To Look at and Where: Semantic and Spatial Refined Transformer for Detecting Human-Object Interactions

by: Kazuki Omi

3D 3D reconstruction Disentanglement

NeRFReN: Neural Radiance Fields With Reflections

by: Naoya Chiba

Human-Object Interaction Detection via Disentangled Transformer

by: Kazuki Omi

Object detection Human-Object Interaction

GraFormer: Graph-Oriented Transformer for 3D Pose Estimation

by: horiem

3D Attention Representation learning

Open Challenges in Deep Stereo: The Booster Dataset

by: Masanori YANO

Dataset Depth estimation Point cloud Segmentation Stereo matching

Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-Robust Makeup Transfer

by: 加藤義道

Adversarial examples Disentanglement GAN Recognition

Episodic Memory Question Answering

by: Yusuke Mori

3D Dataset Multi modal Robustness Vision and language

GazeOnce: Real-Time Multi-Person Gaze Estimation

by: Anonymous

GIFS: Neural Implicit Function for General Shape Representation

by: Naoya Chiba

3D Point cloud Self supervised learning

Learning Deep Implicit Functions for 3D Shapes With Dynamic Code Clouds

by: Naoya Chiba

Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection

by: 志田遥飛

Self supervised learning Anomaly Detection new architectural

Anomaly Detection via Reverse Distillation From One-Class Embedding

by: Hiroki Kobayashi

Knowledge distillation Anomaly detection Anomaly localization

Multimodal Material Segmentation

by: Masanori YANO

Dataset Multi modal Segmentation Semantic segmentation

Directional Self-Supervised Learning for Heavy Image Augmentations

by: 志田遥飛

Representation learning Self supervised learning Hard Augmentation Data Augmentation

AdaFace: Quality Adaptive Margin for Face Recognition

by: 鈴木共生

Face

Align Representations With Base: A New Approach to Self-Supervised Learning

by: 志田遥飛

Self supervised learning Theoretical analysis degenerated solutions

Amodal Panoptic Segmentation

by: Masanori YANO

Attention Dataset Segmentation Panoptic segmentation

End-to-End Semi-Supervised Learning for Video Action Detection

by: Kazuki Omi

Attention Object detection Video Semi-supervised learning

GAN-Supervised Dense Visual Alignment

by: Taiki Sugiura

Cross-Domain Adaptive Teacher for Object Detection

by: Kazuki Omi

Dataset Object detection Robustness Self supervised learning

Noise Is Also Useful: Negative Correlation-Steered Latent Contrastive Learning

by: Yui Iioka (Keio University)

Towards Total Recall in Industrial Anomaly Detection

by: Shunsuke Nakatsuka

Anomaly Detection

Active Learning by Feature Mixing

by: Shunsuke Nakatsuka

Active Learning

Unsupervised Homography Estimation With Coplanarity-Aware GAN

by: 角田良太朗

GAN homography estimation

Iterative Deep Homography Estimation

by: 角田良太朗

homography estimation

BokehMe: When Neural Rendering Meets Classical Rendering

by: 角田良太朗

bokeh

Deep Vanishing Point Detection: Geometric Priors Make Dataset Variations Vanish

by: 角田良太朗

vanishing point detection

3D Moments From Near-Duplicate Photos

by: 角田良太朗

3D 3D object detection Point cloud

Bijective Mapping Network for Shadow Removal

by: 角田良太朗

shadow removal

Learning To Generate Line Drawings That Convey Geometry and Semantics

by: 角田良太朗

GAN line drawing

OnePose: One-Shot Object Pose Estimation Without CAD Models

by: 角田良太朗

Deformable Sprites for Unsupervised Video Decomposition

by: 角田良太朗

Instance segmentation Segmentation Semantic segmentation

Panoptic, Instance and Semantic Relations: A Relational Context Encoder To Enhance Panoptic Segmentation

by: 角田良太朗

3D Photo Stylization: Learning To Generate Stylized Novel Views From a Single Image

by: 角田良太朗

3D Point cloud

Towards Layer-Wise Image Vectorization

by: 角田良太朗

Image Vectorization

Multi-View Depth Estimation by Fusing Single-View Depth Probability With Multi-View Geometry

by: 角田良太朗

Deblurring via Stochastic Refinement

by: 角田良太朗

deblur

Visual Vibration Tomography: Estimating Interior Material Properties From Monocular Video

by: 角田良太朗

Adaptive Gating for Single-Photon 3D Imaging

by: 角田良太朗

LiDAR

Dual-Shutter Optical Vibration Sensing

by: 角田良太朗

Vibration Sensing

Neural Reflectance for Shape Recovery With Shadow Handling

by: 角田良太朗

GAN Knowledge distillation

Robust Equivariant Imaging: A Fully Unsupervised Framework for Learning To Image From Noisy and Partial Measurements

by: 角田良太朗

Robustness

Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation

by: Taiki Sugiura

Towards Real-World Navigation With Deep Differentiable Planners

by: 朝岡忠

Navigation

SimVP: Simpler Yet Better Video Prediction

by: Masanori YANO

Video Video prediction CNN

CoNeRF: Controllable Neural Radiance Fields

by: Naoya Chiba

3D Disentanglement Neural radiance fields (NeRF) N-shot learning

DoubleField: Bridging the Neural Surface and Radiance Fields for High-Fidelity Human Reconstruction and Rendering

by: Keiichi Sawada

EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

by: Tomoya Nitta

Dataset Video Vision and language

TransRAC: Encoding Multi-Scale Temporal Correlation With Transformers for Repetitive Action Counting

by: Jun Kimata

HINT: Hierarchical Neuron Concept Explainer

by: Tomoya Nitta

Recognition Explainable AI

On Guiding Visual Attention With Language Specification

by: QIUYUE

Attention Vision and language

Bootstrapping ViTs: Towards Liberating Vision Transformers From Pre-Training

by: Hirokatsu Kataoka

Knowledge distillation

Class-Incremental Learning With Strong Pre-Trained Models

by: Hirokatsu Kataoka

Class-Incremental Learning

ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues

by: Hirokatsu Kataoka

3D object detection 3D reconstruction Pose estimation

EPro-PnP: Generalized End-to-End Probabilistic Perspective-N-Points for Monocular Object Pose Estimation

by: Takahiro Maeda

Learning Part Segmentation Through Unsupervised Domain Adaptation From Synthetic Vehicles

by: Fumiharu Suzuki

Dataset Domain adaptation Segmentation Semantic segmentation

COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

by: Naoto Shirai

Think Global, Act Local: Dual-Scale Graph Transformer for Vision-and-Language Navigation

by: Motonari Kambara

3D reconstruction Pose estimation

Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture

by: Takahiro Maeda

End-to-End Reconstruction-Classification Learning for Face Forgery Detection

by: 鈴木共生

Face

Exploring Patch-Wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks

by: Taiki Sugiura

Object detection Feature Selection

Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection

by: Atsuki Osanai

Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis

by: Masanori YANO

3D Point cloud Recognition 3D semantic segmentation

Revealing Occlusions With 4D Neural Fields

by: Naoya Chiba

3D Depth estimation Disentanglement Instance segmentation Point cloud

MAXIM: Multi-Axis MLP for Image Processing

by: 角田良太朗

MLP architecture

Revisiting Temporal Alignment for Video Restoration

by: Jun Kimata

Super resolution Video

HCSC: Hierarchical Contrastive Selective Coding

by: Atsuki Osanai

Representation learning Contrastive Learning

NinjaDesc: Content-Concealing Visual Descriptors via Adversarial Learning

by: hayamizu ryo

Adversarial examples

Egocentric Prediction of Action Target in 3D

by: Hirokatsu Kataoka

3D Action recognition Dataset

HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction

by: Hirokatsu Kataoka

3D Action recognition Dataset Point cloud Pose estimation Semantic segmentation

Transferable Sparse Adversarial Attack

by: hayamizu ryo

Adversarial examples

The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization

by: Atsuki Osanai

Domain adaptation

Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark

by: Hirokatsu Kataoka

Dataset Recognition Semantic segmentation

Single-Domain Generalized Object Detection in Urban Scene via Cyclic-Disentangled Self-Distillation

by: 近藤拓未

An Empirical Study of Training End-to-End Vision-and-Language Transformers

by: QIUYUE

Video Temporal Action Localization

Upright-Net: Learning Upright Orientation for 3D Point Cloud

by: hayamizu ryo

Point cloud

Per-Clip Video Object Segmentation

by: Ryunosuke Ishikawa

Segmentation

Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation

by: Ryota Hashiguchi

Unsupervised Domain Generalization by Learning a Bridge Across Domains

by: Anonymous

Domain adaptation

Are Multimodal Transformers Robust to Missing Modality?

by: QIUYUE

Unbiased Teacher v2: Semi-Supervised Object Detection for Anchor-Free and Anchor-Based Detectors

by: Hirokatsu Kataoka

3D object detection Action recognition Dataset

JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection

by: Hirokatsu Kataoka

A Self-Supervised Descriptor for Image Copy Detection

by: Atsuki Osanai

Contrastive Learning

DeepFace-EMD: Re-Ranking Using Patch-Wise Earth Mover’s Distance Improves Out-of-Distribution Face Identification

by: 前野一樹

Adversarial examples Recognition Robustness Face Identification

SPAct: Self-Supervised Privacy Preservation for Action Recognition

by: 志田遥飛

Action recognition Self supervised learning Privacy Privacy Preservation

Learning To Answer Questions in Dynamic Audio-Visual Scenarios

by: QIUYUE

Dataset Video Vision and language

Learning Where To Learn in Cross-View Self-Supervised Learning

by: 志田遥飛

Attention Recognition Pruning

Patch Slimming for Efficient Vision Transformers

by: Sora Takashima （高島空良）

BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild

by: QIUYUE

Dataset Multi modal Vision and language

Instance-Wise Occlusion and Depth Orders in Natural Scenes

by: Hirokatsu Kataoka

Dataset Depth estimation Semantic segmentation

DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation

by: Hirokatsu Kataoka

Dataset Semantic segmentation

Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities

by: Hirokatsu Kataoka

Action recognition Dataset

Contrastive Learning for Space-Time Correspondence via Self-Cycle Consistency

by: Anonymous

Representation learning Segmentation Semantic segmentation Self supervised learning Video

Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions

by: QIUYUE

N-shot learning Vision and language

Cross Modal Retrieval With Querybank Normalisation

by: Mikihiro Tanaka

TCTrack: Temporal Contexts for Aerial Tracking

by: Jun Kimata

Domain adaptation Semantic segmentation

Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution

by: 角田良太朗

GAN Super resolution

CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows

by: 鈴木共生

Attention Transformer

Semantic-Aware Domain Generalized Segmentation

by: Takehiro Matsuda

Registering Explicit to Implicit: Towards High-Fidelity Garment Mesh Reconstruction From Single Images

by: Naoya Chiba

3D 3D reconstruction Depth estimation Pose estimation

Disentangling Visual and Written Concepts in CLIP

by: Anonymous

Representation learning Self supervised learning Video Hierarchical Consistency leraning

Learning From Untrimmed Videos: Self-Supervised Video Representation Learning With Hierarchical Consistency

by: 志田遥飛

CMT: Convolutional Neural Networks Meet Vision Transformers

by: 鈴木共生

Attention Object detection Recognition Segmentation

Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes

by: Naoya Chiba

3D 3D object detection Object detection Point cloud Pose estimation

A Hybrid Quantum-Classical Algorithm for Robust Fitting

by: 石井央

3D reconstruction Robustness Optimization method Quantum computing

Multi-Grained Spatio-Temporal Features Perceived Network for Event-Based Lip-Reading

by: 堀涼介

Action recognition Dataset Recognition Video Vision and language

3D Scene Painting via Semantic Image Synthesis

by: QIUYUE

HairMapper: Removing Hair From Portraits Using GANs

by: Anonymous

Learning Affordance Grounding From Exocentric Images

by: QIUYUE

Dataset Self supervised learning

Structured Sparse R-CNN for Direct Scene Graph Generation

by: Yoshiki Nagasaki

Scene Graph Generation

When Does Contrastive Visual Representation Learning Work?

by: S Ishikawa

How Much Does Input Data Type Impact Final Face Model Accuracy?

by: 鈴木共生

3D 3D reconstruction Face

It’s About Time: Analog Clock Reading in the Wild

by: Masanori YANO

Dataset Recognition Sim2real

AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis

by: Naoya Chiba

3D 3D object detection Disentanglement Point cloud Self supervised learning

Learning To Learn Across Diverse Data Biases in Deep Face Recognition

by: 鈴木共生

Meta learning Recognition Face

StyleSwin: Transformer-Based GAN for High-Resolution Image Generation

by: Anonymous

Attention GAN

Polymorphic-GAN: Generating Aligned Samples Across Multiple Domains With Learned Morph Maps

by: Anonymous

Domain adaptation Super resolution

Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution

by: 角田良太朗

Discrete Cosine Transform Network for Guided Depth Map Super-Resolution

by: 角田良太朗

Depth estimation Super resolution

Enhancing Face Recognition With Self-Supervised 3D Reconstruction

by: 鈴木共生

3D 3D reconstruction Self supervised learning Face

ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes

by: Naoya Chiba

3D Point cloud Pose estimation Self supervised learning

Attentive Fine-Grained Structured Sparsity for Image Restoration

by: Masanori YANO

Super resolution Image restoration Deblurring Pruning Sparsity

DisARM: Displacement Aware Relation Module for 3D Detection

by: Naoya Chiba

3D 3D object detection Point cloud Pose estimation

Rethinking Controllable Variational Autoencoders

by: Yusuke Mori

Disentanglement Representation learning

An Efficient Training Approach for Very Large Scale Face Recognition

by: 鈴木共生

Recognition Face

Language As Queries for Referring Video Object Segmentation

by: Ryuichi Nakahara

Segmentation Video Vision and language

Weakly Supervised High-Fidelity Clothing Model Generation

by: Risa Shinoda

3D GAN Neural radiance fields (NeRF)

Deep Rectangling for Image Stitching: A Learning Baseline

by: 角田良太朗

Image stitching

CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation

by: 角田良太朗

Optical flow Point cloud

GIRAFFE HD: A High-Resolution 3D-Aware Generative Model

by: Anonymous

Towards an End-to-End Framework for Flow-Guided Video Inpainting

by: 角田良太朗

Video Inpainting

CaDeX: Learning Canonical Deformation Coordinate Space for Dynamic Surface Representation via Neural Homeomorphism

by: 古川遼

Self supervised learning Denoise

CVF-SID: Cyclic Multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise From Image

by: Anonymous

AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network

by: Anonymous

Self supervised learning Denoise

FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos

by: Ryo Miyoshi

Dataset Video Facial Expression Recognition

3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

by: 鈴木共生

3D 3D reconstruction Face

Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation

by: Yuta Hamada

Dense Depth Priors for Neural Radiance Fields From Sparse Input Views

by: Kazuhito Sato

Negative-Aware Attention Framework for Image-Text Matching

by: Ryou Mutou

Self supervised learning Gradient Framework

Exploring the Equivalence of Siamese Self-Supervised Learning via a Unified Gradient Framework

by: 志田遥飛

Learning To Solve Hard Minimal Problems

by: yoshiki miyazawa

Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation

by: Yuma Ochi

SNN

It Is Okay To Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection

by: hayamizu ryo

3D Neural radiance fields (NeRF)

M5Product: Self-Harmonized Contrastive Learning for E-Commercial Multi-Modal Pretraining

by: hayamizu ryo

Dataset Multi modal

GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation

by: hayamizu ryo

Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data

by: Anonymous

Optical flow Pose estimation

IFOR: Iterative Flow Minimization for Robotic Object Rearrangement

by: Takahiro Suzuki

Selective-Supervised Contrastive Learning With Noisy Labels

by: hayamizu ryo

3D Dataset Human Scene Contact

Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training

by: hayamizu ryo

Adversarial examples

Capturing and Inferring Dense Full-Body Human-Scene Contact

by: Kosuke Fuazawa

Event-Based Video Reconstruction via Potential-Assisted Spiking Neural Network

by: Yuma Ochi

SNN Image reconstruction

Progressive End-to-End Object Detection in Crowded Scenes

by: Anonymous

3D Attention Neural radiance fields (NeRF) Transformer

GeoNeRF: Generalizing NeRF With Geometry Priors

by: Masanori YANO

Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering

by: Ryuichi Nakahara

3D 3D object detection 3D reconstruction Depth estimation Object detection Pose estimation

C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image

by: Ryuichi Nakahara

Semantic segmentation

Learning 3D Object Shape and Layout Without 3D Supervision

by: Naoya Chiba

Rethinking Efficient Lane Detection via Curve Modeling

by: Yuma Ochi

Autonomous driving Lane detection

Accurate 3D Body Shape Regression Using Metric and Semantic Attributes

by: hayamizu ryo

3D Person re-identification

V2C: Visual Voice Cloning

by: hayamizu ryo

BEVT: BERT Pretraining of Video Transformers

by: hayamizu ryo

Adversarial examples Recognition Face

Simulated Adversarial Testing of Face Recognition Models

by: 鈴木共生

OcclusionFusion: Occlusion-Aware Motion Estimation for Real-Time Dynamic 3D Reconstruction

by: Naoya Chiba

3D 3D reconstruction Point cloud Pose estimation Video

AKB-48: A Real-World Articulated Object Knowledge Base

by: hayamizu ryo

3D 3D object detection Dataset

Neural Convolutional Surfaces

by: Takashi Imoto

3D 3D object detection

Contour-Hugging Heatmaps for Landmark Detection

by: Ryuichi Nakahara

Attention Segmentation Semantic segmentation

Revisiting Near/Remote Sensing With Geospatial Attention

by: Masanori YANO

Brain-Inspired Multilayer Perceptron With Spiking Neurons

by: Yuma Ochi

Object detection Semantic segmentation SNN MLP

Faithful Extreme Rescaling via Generative Prior Reciprocated Invertible Representations

by: 角田良太朗

GAN Super resolution Invertible image restoration

SAR-Net: Shape Alignment and Recovery Network for Category-Level 6D Object Pose and Size Estimation

by: Naoya Chiba

3D Point cloud Pose estimation Self supervised learning

Which Images To Label for Few-Shot Medical Landmark Detection?

by: Ryuichi Nakahara

Harmony: A Generic Unsupervised Approach for Disentangling Semantic Content From Parameterized Transformations

by: Ryuichi Nakahara

Harmony: A Generic Unsupervised Approach for Disentangling Semantic Content From Parameterized Transformations

by: Ryuichi Nakahara

3D 3D object detection Depth estimation Object detection Point cloud Pose estimation

The Devil Is in the Labels: Noisy Label Correction for Robust Scene Graph Generation

by: Ryota Hashiguchi

Scene Graph Generation

DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation

by: Naoya Chiba

BoostMIS: Boosting Medical Image Semi-Supervised Learning With Adaptive Pseudo Labeling and Informative Active Annotation

by: Ryuichi Nakahara

Adversarial examples Meta learning Recognition Face

Spiking Transformers for Event-Based Single Object Tracking

by: Yuma Ochi

Object tracking SNN

Exploring Frequency Adversarial Attacks for Face Forgery Detection

by: 鈴木共生

DiRA: Discriminative, Restorative, and Adversarial Learning for Self-Supervised Medical Image Analysis

by: Ryuichi Nakahara

ContIG: Self-Supervised Multimodal Contrastive Learning for Medical Imaging With Genetics

by: Ryuichi Nakahara

Self supervised learning Video Vision and language

Semi-Supervised Video Paragraph Grounding With Contrastive Encoder

by: QIUYUE

HyperSegNAS: Bridging One-Shot Neural Architecture Search With 3D Medical Image Segmentation Using HyperNet

by: Ryuichi Nakahara

Neural architecture search(NAS)

M3T: Three-Dimensional Medical Image Classifier Using Multi-Plane and Multi-Slice Transformer

by: Ryuichi Nakahara

3D

Incremental Cross-View Mutual Distillation for Self-Supervised Medical CT Synthesis

by: Ryuichi Nakahara

3D

Object-Aware Video-Language Pre-Training for Retrieval

by: QIUYUE

Self supervised learning Video Vision and language

Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-Time

by: Anonymous

3D 3D reconstruction Neural radiance fields (NeRF) Video

Multi-Marginal Contrastive Learning for Multi-Label Subcellular Protein Localization

by: Ryuichi Nakahara

Segmentation

Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

by: QIUYUE

Attention Super resolution Video

Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation

by: Ryuichi Nakahara

Segmentation

IFOR: Iterative Flow Minimization for Robotic Object Rearrangement

by: Takahiro Suzuki

3D Optical flow

PNP: Robust Learning From Noisy Labels by Probabilistic Noise Prediction

by: Kazuki Omi

Recognition Noisy Labels

WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation

by: hayamizu ryo

3D GAN Point cloud

Self-Supervised Bulk Motion Artifact Removal in Optical Coherence Tomography Angiography

by: Ryuichi Nakahara

denoising

Learning Trajectory-Aware Transformer for Video Super-Resolution

by: 角田良太朗

Parametric Scattering Networks

by: 角田良太朗

Wavelet

Comprehending and Ordering Semantics for Image Captioning

by: Ryosuke Oshima

Dataset Multi modal Vision and language

Coupled Iterative Refinement for 6D Multi-Object Pose Estimation

by: Naoya Chiba

3D 3D object detection Point cloud Pose estimation

Online Learning of Reusable Abstract Models for Object Goal Navigation

by: Anonymous

Action recognition reinfrocement learning

Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture

by: 朝岡忠

Vision and language General purpose vision

A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty

by: Yuma Ochi

Sampling Imbalanced dataset

Towards Low-Cost and Efficient Malaria Detection

by: Shunsuke Yoshizawa

Dataset Domain adaptation Object detection

MatteFormer: Transformer-Based Image Matting via Prior-Tokens

by: 志田遥飛

Semantic segmentation Image Matting Transformer

Debiased Learning From Naturally Imbalanced Pseudo-Labels

by: Yuma Ochi

SSL ZSL Pseudl-Labels

Towards Accurate Facial Landmark Detection via Cascaded Transformers

by: 鈴木共生

Attention Face

Contrastive Boundary Learning for Point Cloud Segmentation

by: 志田遥飛

3D Semantic segmentation

Contrastive Boundary Learning for Point Cloud Segmentation

by: 志田遥飛

3D Semantic segmentation

Learning to Deblur Using Light Field Generated and Real Defocus Images

by: 角田良太朗

Defocus deblur

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

by: 志田遥飛

Attention Inpainting，Transformer and CNN

Kubric: A Scalable Dataset Generator

by: Yuma Ochi

Dataset Video

Restormer: Efficient Transformer for High-Resolution Image Restoration

by: 角田良太朗

Attention

Exploring Effective Data for Surrogate Training Towards Black-Box Attack

by: 志田遥飛

Adversarial examples Attack，Defese

Smooth Maximum Unit: Smooth Activation Function for Deep Networks Using Smoothing Maximum Technique

by: Anonymous

Activation function

Pyramid Grafting Network for One-Stage High Resolution Saliency Detection

by: 志田遥飛

Semantic segmentation

On the Road to Online Adaptation for Semantic Image Segmentation

by: Anonymous

Semantic segmentation

UCC: Uncertainty Guided Cross-Head Co-Training for Semi-Supervised Semantic Segmentation

by: Atsuki Osanai

Semantic segmentation Self-supervised learning

Frame Averaging for Equivariant Shape Space Learning

by: 古川遼

Semi-supervised learning Contrastive learning

Class-Aware Contrastive Semi-Supervised Learning

by: Atsuki Osanai

DIVeR: Real-Time and Accurate Neural Radiance Fields With Deterministic Integration for Volume Rendering

by: Anonymous

ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework

by: Naoya Chiba

Dataset Segmentation Video VOS

YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset

by: Masanori YANO

Burst Image Restoration and Enhancement

by: 角田良太朗

Super resolution Denoising Enhancement Burst photography

Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds

by: Yuki Kohara

3D object detection Point cloud

Active Teacher for Semi-Supervised Object Detection

by: Kiyoshi Hashimoto

Knowledge distillation Recognition Vision and language

Integrating Language Guidance Into Vision-Based Deep Metric Learning

by: 鈴木共生

Multi-Granularity Alignment Domain Adaptation for Object Detection

by: Anonymous

3D 3D object detection Object detection

ABO: Dataset and Benchmarks for Real-World 3D Object Understanding

by: Yusuke Mori

3D 3D reconstruction Dataset

Do Learned Representations Respect Causal Relationships?

by: worldblue

3D 3D object detection Dataset

Long-Tailed Recognition via Weight Balancing

by: Yuma Ochi

Recognition LTR

Coarse-To-Fine Deep Video Coding With Hyperprior-Guided Mode Prediction

by: Anonymous

Video Video Coding，

Escaping Data Scarcity for High-Resolution Heterogeneous Face Hallucination

by: Anonymous

Face

Data-Free Network Compression via Parametric Non-Uniform Mixed Precision Quantization

by: Eisuke Yamagata

network compression

PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation With Photometrically Challenging Objects

by: hayamizu ryo

Open-Set Text Recognition via Character-Context Decoupling

by: yoshiki miyazawa

Attention N-shot learning Recognition Robustness

Revisiting Skeleton-Based Action Recognition

by: Masanori YANO

Action recognition Pose estimation Video

Scaling Up Vision-Language Pre-Training for Image Captioning

by: Ryosuke Oshima

Dataset Multi modal Vision and language

Semi-Supervised Few-Shot Learning via Multi-Factor Clustering

by: Atsuki Osanai

Few-shot learning Semi-supervised learning

Globetrotter: Connecting Languages by Connecting Images

by: Ryosuke Oshima

A Keypoint-Based Global Association Network for Lane Detection

by: Masanori YANO

Lane detection

Cycle-Consistent Counterfactuals by Latent Transformations

by: 古澤嘉久

3D 3D object detection Pose estimation

ZebraPose: Coarse To Fine Surface Encoding for 6DoF Object Pose Estimation

by: Naoya Chiba

PlanarRecon: Real-Time 3D Plane Detection and Reconstruction From Posed Monocular Videos

by: Masanori YANO

3D Dataset Vision and language

ScanQA: 3D Question Answering for Spatial Scene Understanding

by: Anonymous

Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent From the Decision Boundary Perspective

by: 髙橋秀弥

Robust Cross-Modal Representation Learning With Progressive Self-Distillation

by: Yui Iioka (Keio University)

Knowledge distillation Multi modal Representation learning Robustness Self supervised learning Vision and language

Event-Aided Direct Sparse Odometry

by: Godel

Dataset Visual Odometry、Event based Camera、Direct Method

EDTER: Edge Detection With Transformer

by: Masanori YANO

Edge detection Transformer

SNR-Aware Low-Light Image Enhancement

by: Fumiharu Suzuki

Attention Recognition

VALHALLA: Visual Hallucination for Machine Translation

by: Tosho Hirasawa

N-shot learning Recognition Representation learning Vision and language

LiT: Zero-Shot Transfer With Locked-Image Text Tuning

by: Sora Takashima （高島空良）

Convolution of Convolution: Let Kernels Spatially Collaborate

by: 志田遥飛

Convolution

SphericGAN: Semi-Supervised Hyper-Spherical Generative Adversarial Networks for Fine-Grained Image Synthesis

by: Taiki Sugiura