ECCV2022論文サマリ

3D 3D object detection Multimodal Ensembling

Multimodal Object Detection via Probabilistic Ensembling

by: Haruhi Shida

LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds

by: Haruhi Shida

3D Point cloud Semantic segmentation Lidar

Explicit Image Caption Editing

by: Takeru Endo

3D 3D object detection Depth estimation Video Monocular 3D Object Detection Depth from Motion

Monocular 3D Object Detection with Depth from Motion

by: Haruhi Shida

New Datasets and Models for Contextual Reasoning in Visual Dialog

by: Ryosuke Oshima

3D 3D object detection 3D reconstruction Meta learning N-shot learning Point cloud

Meta-Sampler: Almost-Universal yet Task-Oriented Sampling for Point Clouds

by: Naoya Chiba

LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds

by: Haruhi Shida

3D Semantic segmentation Efficient

EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision Transformers

by: junk

Attention Edge

Perceiving and Modeling Density for Image Dehazing

by: Masanori YANO

Attention Image dehazing

Simple Baselines for Image Restoration

by: Tong Zheng

Image Restoration

Towards Open-Vocabulary Scene Graph Generation with Prompt-Based Finetuning

by: Seitaro Shinagawa

3D Point cloud Robustness

VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder

by: Tong Zheng

Image restoration

diffConv: Analyzing Irregular Point Clouds with an Irregular View

by: Naoya Chiba

Learning Ego 3D Representation As Ray Tracing

by: shoji sonoyama

3D object detection Segmentation BEV

Pose-NDF: Modeling Human Pose Manifolds with Neural Distance Fields

by: Kosuke Fukazawa

3D Pose estimation Pose Prior Pose Manifold Neural Fields

Perceptual Artifacts Localization for Inpainting

by: Tong Zheng

Image inpainting

Towards Grand Unification of Object Tracking

by: Masanori YANO

Video Object tracking SOT MOT VOS MOTS

When Active Learning Meets Implicit Semantic Data Augmentation

by: Anonymous

Active Learning

Registration Based Few-Shot Anomaly Detection

by: Kayo Okuda

N-shot learning Recognition Anomaly Detection

SuperLine3D: Self-Supervised Line Segmentation and Description for LiDAR Point Cloud

by: Naoya Chiba

3D Point cloud Pose estimation Self supervised learning

Unpaired Image Translation via Vector Symbolic Architectures

by: Tong Zheng

Image translation

Restore Globally, Refine Locally: A Mask-Guided Scheme to Accelerate Super-Resolution Networks

by: Tong Zheng

3D Dataset Object detection

OPD: Single-View 3D Openable Part Detection

by: Masanori YANO

BLT: Bidirectional Layout Transformer for Controllable Layout Generation

by: Yoshiki Kubotani

Attention Layout Generation

CT2: Colorization Transformer via Color Tokens

by: Masanori YANO

Attention Colorization Transformer

Few-Shot Class-Incremental Learning for 3D Point Cloud Objects

by: Naoya Chiba

3D N-shot learning Point cloud Representation learning

Adaptive Patch Exiting for Scalable Single Image Super-Resolution

by: Tong Zheng

3D 3D object detection Neural architecture search(NAS) Object detection Point cloud

LidarNAS: Unifying and Searching Neural Architectures for 3D Point Clouds

by: Naoya Chiba

Hunting Group Clues with Transformers for Social Group Activity Recognition

by: Chihiro Nakatani（中谷千洋）

Object detection Cell tracking

Graph Neural Network for Cell Tracking in Microscopy Videos

by: Ryuichi Nakahara

AutoMix: Unveiling the Power of Mixup for Stronger Classifiers

by: Masanori YANO

Object detection Recognition Data augmentation Mixup

MODE: Multi-View Omnidirectional Depth Estimation with 360° Cameras

by: Makoto Sasaki

3D Dataset Depth estimation Stereo Matching Spherical Feature Learning 360◦Cameras Multi-view

Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition

by: Anonymous

Learning to Generate Realistic LiDAR Point Clouds

by: Naoya Chiba

Domain adaptation Recognition Face

SimpleRecon: 3D Reconstruction without 3D Convolutions

by: 角田良太朗

3D Depth estimation

Multi-Domain Learning for Updating Face Anti-Spoofing Models

by: Yasunobu Ogura

Generative Multiplane Images: Making a 2D GAN 3D-Aware

by: Tong Zheng

3D reconstruction GAN

Tracking Objects As Pixel-Wise Distributions

by: Masanori YANO

Object tracking MOT Transformer

Revisiting Outer Optimization in Adversarial Training

by: Anonymous

Adversarial examples

K-SALSA: K-Anonymous Synthetic Averaging of Retinal Images via Local Style Alignment

by: Ryuichi Nakahara

medical image

Relationformer: A Unified Framework for Image-to-Graph Generation

by: Ryuichi Nakahara

Object detection medical

Ultra-High-Resolution Unpaired Stain Transformation via Kernelized Instance Normalization

by: Ryuichi Nakahara

medical

Switchable Online Knowledge Distillation

by: kaikai zhao

Self supervised learning Anomaly detection Anomaly localization

Natural Synthetic Anomalies for Self-Supervised Anomaly Detection and Localization

by: Hiroki Kobayashi

SQN: Weakly-Supervised Semantic Segmentation of Large-Scale 3D Point Clouds

by: Naoya Chiba

3D Point cloud Segmentation Semantic segmentation

Learning Topological Interactions for Multi-Class Medical Image Segmentation

by: Tong Zheng

Segmentation Semantic segmentation Medical Imaging

Temporally Consistent Semantic Video Editing

by: 角田良太朗

GAN Video

Domain Adaptive Person Search

by: Masanori YANO

Domain adaptation Person search

SpOT: Spatiotemporal Modeling for 3D Object Tracking

by: Haruhi Shida

3D Tracking

KING: Generating Safety-Critical Driving Scenarios for Robust Imitation via Kinematics Gradients

by: Haruhi Shida

Robustness Generating Driving Scenarios

Gradient-Based Uncertainty for Monocular Depth Estimation

by: Tomoharu Yoshida

Depth estimation Self supervised learning uncertainty autonomous driving

FBNet: Feedback Network for Point Cloud Completion

by: Haruhi Shida

3D Point Cloud Completion Feedback Network Cross Transformer

Graph R-CNN: Towards Accurate 3D Object Detection with Semantic-Decorated Local Graph

by: Haruhi Shida

3D object detection point cloud multiple sensor

Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments

by: 朝岡忠

3D Adversarial examples N-shot learning Point cloud Segmentation Semantic segmentation

Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation

by: Naoya Chiba

Unpaired Deep Image Dehazing Using Contrastive Disentanglement Learning

by: Tong Zheng

Self supervised learning Image Dehazing

PointScatter: Point Set Representation for Tubular Structure Extraction

by: Ryuichi Nakahara

Point cloud medical

Differentiable Zooming for Multiple Instance Learning on Whole-Slide Images

by: Ryuichi Nakahara

medical

Pairwise Contrastive Learning Network for Action Quality Assessment

by: smygw (宮川翔貴)

Action recognition Video

Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding

by: Naoya Chiba

3D 3D reconstruction Action recognition Point cloud Segmentation Semantic segmentation

RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation

by: shoji sonoyama

Depth estimation Self supervised learning

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

by: kaikai zhao

A Data-Centric Approach for Improving Ambiguous Labels with Combined Semi-Supervised Classification and Clustering

by: anonymous

Dataset

Revisiting Point Cloud Simplification: A Learnable Feature Preserving Approach

by: Naoya Chiba

Masked Generative Distillation

by: kaikai zhao

3D 3D object detection Lidar Monocular 3D Object Detection Cross-Modality Distillation

Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection

by: Haruhi Shida

Image Super-Resolution with Deep Dictionary

by: Tong Zheng

HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors

by: kaikai zhao

KXNet: A Model-Driven Deep Neural Network for Blind Super-Resolution

by: Masanori YANO

GAN Meta learning Navigation

Generative Meta-Adversarial Network for Unseen Object Navigation

by: 朝岡忠

Generalizable Medical Image Segmentation via Random Amplitude Mixup and Domain-Specific Image Restoration

by: Tong Zheng

Instance segmentation Segmentation Semantic segmentation Domain adaptation

Exploring Gradient-Based Multi-directional Controls in GANs

by: 加藤義道

Disentanglement GAN

InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images

by: Akihiro FUJII

3D GAN Self supervised learning

CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving

by: Haruhi Shida

Dataset Object detection Autonomous Driving Corner Case

GLAMD: Global and Local Attention Mask Distillation for Object Detectors

by: Anonymous

Knowledge distillation Object detection

Master of All: Simultaneous Generalization of Urban-Scene Segmentation to All Adverse Weather Conditions

by: Haruhi Shida

Semantic segmentation Autonomous Driving Multi-target Domain Generalization All Weather UrbanScene segmentation

CenterFormer: Center-based Transformer for 3D Object Detection

by: Haruhi Shida

Object detection Lidar point cloud multiframe fusion transformer

Personalized Education: Blind Knowledge Distillation

by: Anonymous

3D Point cloud Super resolution

SeedFormer: Patch Seeds Based Point Cloud Completion with Upsample Transformer

by: Naoya Chiba

Open-World Semantic Segmentation for LIDAR Point Clouds

by: Haruhi Shida

Semantic segmentation Open-world Semantic Segmentation Lidar Point Clouds Open-set Semantic Segmentation Incremental Learning

Physical Attack on Monocular Depth Estimation with Optimal Adversarial Patches

by: Haruhi Shida

3D object detection Physical Adversarial Attack Monocular Depth Estimation Autonomous Driving

RealFlow: EM-Based Realistic Optical Flow Dataset Generation from Videos

by: 角田良太朗

Dataset Optical flow

Multimodal Transformer for Automatic 3D Annotation and Object Detection

by: Haruhi Shida

3D object detection transformer 3D AutoLabeler Multimodal Vision Self-Attention Self-supervision

Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection

by: Haruhi Shida

3D object detection Multimodal self-attention feature-level fusion

V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer

by: Haruhi Shida

Transformer Autonomous Driving vehicle-to-everything cooperative perception

Emotion-Aware Multi-View Contrastive Learning for Facial Emotion Recognition

by: 奥西泰地

Partial Distance Correlation

On the Versatile Uses of Partial Distance Correlation in Deep Learning

by: Ryuichi Nakahara

Revisiting a kNN-based Image Classification System with High-capacity Storage

by: Masanori YANO

Recognition Continual learning

PointMixer: MLP-Mixer for Point Cloud Understanding

by: Naoya Chiba

3D 3D reconstruction Point cloud

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

by: Tong Zheng

GAN Representation learning Diffusion model

Overcoming Shortcut Learning in a Target Domain by Generalizing Basic Visual Factors from a Source Domain

by: Akihiro FUJII

Domain adaptation

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

by: Akihiro FUJII

Multi modal

PS-NeRF: Neural Inverse Rendering for Multi-View Photometric Stereo

by: Murakami

RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering

by: Anonymous

3D Dataset Instance segmentation N-shot learning Point cloud Segmentation

Geodesic-Former: A Geodesic-Guided Few-Shot 3D Point Cloud Instance Segmenter

by: Naoya Chiba

Exploring Lottery Ticket Hypothesis in Spiking Neural Networks

by: Akihiro FUJII

Lottery Ticket Hypothesis

GraphFit: Learning Multi-Scale Graph-Convolutional Representation for Point Cloud Normal Estimation

by: Naoya Chiba

Explaining Deepfake Detection by Analysing Image Matching

by: 加藤義道

Video Deepfake

ASSISTER: Assistive Navigation via Conditional Instruction Generation

by: Anonymous

3D Knowledge distillation Point cloud Semantic segmentation

DCCF: Deep Comprehensible Color Filter Learning Framework for High-Resolution Image Harmonization

by: 角田良太朗

Color Filter

2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds

by: Naoya Chiba

Rethinking Closed-Loop Training for Autonomous Driving

by: Haruhi Shida

RL，Autonomous Driving Closed-loop Leaning

PressureVision: Estimating Hand Pressure from a Single RGB Image

by: Haruhi Shida

Action recognition Dataset RGB Image

Pose for Everything: Towards Category-Agnostic Pose Estimation

by: Haruhi Shida

Dataset N-shot learning Pose estimation few-shot 2D Pose Estimation class-agnostic MP-100 Dataset

Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement

by: Haruhi Shida

Pose estimation Multi-person 3D Mesh Reconstruction Transformer from Monocular RGB Image

CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection

by: Kouhei Sekiguchi

3D object detection Object detection Point cloud

Action Quality Assessment with Temporal Parsing Transformer

by: smygw (宮川翔貴)

Action recognition Video

Stripformer: Strip Transformer for Fast Image Deblurring

by: Masanori YANO

Attention Image deblurring Transformer

Semi-Supervised Object Detection via Virtual Category Learning

by: junk

3D 3D object detection Dataset Object detection Point cloud Segmentation

Salient Object Detection for Point Clouds

by: Naoya Chiba

DiffuseMorph: Unsupervised Deformable Image Registration Using Diffusion Model

by: Tong Zheng

Registration Diffusion model Medical imaging

Multi-Granularity Pruning for Model Acceleration on Mobile Devices

by: junk

Pruning

Disentangling Object Motion and Occlusion for Unsupervised Multi-Frame Monocular Depth

by: shoji sonoyama

Depth estimation Disentanglement

Text2LIVE: Text-Driven Layered Image and Video Editing

by: 加藤義道

N-shot learning Video Vision and language

Relative Pose from SIFT Features

by: shoji sonoyama

Camera calibration Epipolar geometry RANSAC

Simple Open-Vocabulary Object Detection with Vision Transformers

by: Yui Iioka @KeioUniv.

Multi modal N-shot learning Object detection Vision and language

PreTraM: Self-Supervised Pre-training via Connecting Trajectory and Map

by: Makoto Sasaki

Multi modal Self supervised learning Trajectory Forecasting Pre-training Contrastive Learning

Teaching with Soft Label Smoothing for Mitigating Noisy Labels in Facial Expressions

by: 加賀屋智之

Recognition Robustness

Housekeep: Tidying Virtual Households Using Commonsense Reasoning

by: Anonymous

Dataset Vision and language

Making Heads or Tails: Towards Semantically Consistent Visual Counterfactuals

by: 古澤嘉久

Explainable AI

Improving RGB-D Point Cloud Registration by Learning Multi-Scale Local Linear Transformation

by: Naoya Chiba

3D Pose estimation Self supervised learning

CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

by: Tong Zheng

Dataset Video Facial dataset

PointTree: Transformation-Robust Point Cloud Encoder with Relaxed K-D Trees

by: Naoya Chiba

Optical flow Point cloud Scene Flow Estimation

Bi-PointFlowNet: Bidirectional Learning for Point Cloud Based Scene Flow Estimation

by: Shoma Kato

UniMiSS: Universal Medical Self-Supervised Learning via Breaking Dimensionality Barrier

by: Tong Zheng

3D Segmentation Medical imaging

The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting

by: Haruhi Shida

Dataset Domain adaptation Object detection Video tracking

Cartoon Explanations of Image Classifiers

by: Haruhi Shida

Cartoon Explanations of Image Classifiers

A Broad Study of Pre-training for Domain Generalization and Adaptation

by: Haruhi Shida

Transfer Learning Pre-training Domain Generalization Domain Adaptaion

MOTCOM: The Multi-Object Tracking Dataset Complexity Metric

by: Haruhi Shida

Dataset Metric Multi-Object Tracking

Instance Contour Adjustment via Structure-Driven CNN

by: 角田良太朗

Inpainting

KeypointNeRF: Generalizing Image-Based Volumetric Avatars Using Relative Spatial Encoding of Keypoints

by: Murakami

3D 3D reconstruction Neural radiance fields (NeRF) N-shot learning

Contrastive Monotonic Pixel-Level Modulation

by: 角田良太朗

Domain adaptation GAN

RepMix: Representation Mixing for Robust Attribution of Synthesized Images

by: Tong Zheng

GAN Image attribution

Learning Regional Purity for Instance Segmentation on 3D Point Clouds

by: Naoya Chiba

3D Instance segmentation Point cloud

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning

by: Masanori YANO

Dataset Vision and language

An Impartial Take to the CNN vs Transformer Robustness Contest

by: Akihiro FUJII

Robustness

Tailoring Self-Supervision for Supervised Learning

by: Fumiharu Suzuki

Representation learning Robustness Self supervised learning

Towards Efficient and Effective Self-Supervised Learning of Visual Representations

by: Fumiharu Suzuki

Long-Tail Detection with Effective Class-Margins

by: hayamizu ryo

D&D: Learning Human Dynamics from Dynamic Camera

by: Ryuichi Nakahara

Vision and language Navigation

Learning from Unlabeled 3D Environments for Vision-and-Language Navigation

by: 朝岡忠

Self-Supervised Sparse Representation for Video Anomaly Detection

by: shota nishiyama

Video anomaly detection

PseudoAugment: Learning to Use Unlabeled Data for Data Augmentation in Point Clouds

by: Naoya Chiba

3D 3D object detection N-shot learning Point cloud Self supervised learning

Three Things Everyone Should Know about Vision Transformers

by: Anonymous

Self supervised learning Vision and language Vision Transformer

Learning Depth from Focus in the Wild

by: 角田良太朗

Depth estimation

Robust Landmark-Based Stent Tracking in X-Ray Fluoroscopy

by: Shingo Nakazawa

Object detection Representation learning Motion tracking Medical image Landmark tracking Graph neural network

Classification-Regression for Chart Comprehension

by: Shingo Nakazawa

Attention Multi modal Representation learning Vision and language Chart Question Answering

PointCLM: A Contrastive Learning-Based Framework for Multi-Instance Point Cloud Registration

by: Naoya Chiba

3D 3D object detection Object detection Point cloud Pose estimation

PACS: A Dataset for Physical Audiovisual Commonsense Reasoning

by: Shingo Nakazawa

Dataset Multi modal Representation learning Video Physical commonsense reasoning Vision + audio

Detecting Twenty-Thousand Classes Using Image-Level Supervision

by: Hirokatsu Kataoka

Dataset Object detection Recognition

Abstracting Sketches through Simple Primitives

by: Hirokatsu Kataoka

Dataset Self supervised learning Interpretability

SLIP: Self-Supervision Meets Language-Image Pre-training

by: Hirokatsu Kataoka

Self supervised learning Vision and language

MFIM: Megapixel Facial Identity Manipulation

by: 加藤義道

Domain adaptation GAN Recognition Robustness Face Verification

Masked Siamese Networks for Label-Efficient Learning

by: Hirokatsu Kataoka

Self supervised learning

Controllable and Guided Face Synthesis for Unconstrained Face Recognition

by: Kazuki Maeno

Dual Contrastive Learning with Anatomical Auxiliary Supervision for Few-Shot Medical Image Segmentation

by: Joe Hasei

Segmentation

DeiT III: Revenge of the ViT

by: Hirokatsu Kataoka

Recognition Representation learning Semantic segmentation

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

by: Hirokatsu Kataoka

Object detection Vision and language

VoViT: Low Latency Graph-Based Audio-Visual Voice Separation Transformer

by: Hikaru Ooba

Language-Grounded Indoor 3D Semantic Segmentation in the Wild

by: 朝岡忠

3D Semantic segmentation

Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning

by: Shingo Nakazawa

3D N-shot learning Point cloud Segmentation Semantic segmentation Self supervised learning

Secrets of Event-Based Optical Flow

by: Masanori YANO

Optical flow

Masked Autoencoders for Point Cloud Self-Supervised Learning

by: Naoya Chiba

A Closer Look at Invariances in Self-Supervised Pre-training for 3D Vision

by: Hirokatsu Kataoka

3D object detection Point cloud Representation learning

PASS: Part-Aware Self-Supervised Pre-training for Person Re-identification

by: Hirokatsu Kataoka

Person re-identification Self supervised learning

In Defense of Image Pre-training for Spatiotemporal Recognition

by: Hirokatsu Kataoka

Action recognition Representation learning

MVP: Multimodality-Guided Visual Pre-training

by: Hirokatsu Kataoka

Representation learning Vision and language

Contrastive Vision-Language Pre-training with Limited Resources

by: Hirokatsu Kataoka

MovieCuts: A New Dataset and Benchmark for Cut Type Recognition

by: Hirokatsu Kataoka

Action recognition Dataset Video

Uncertainty-DTW for Time Series and Sequences

by: Anonymous

3D GAN Point cloud Segmentation Unsupervised learning Part decomposition

Unsupervised Pose-Aware Part Decomposition for Man-Made Articulated Objects

by: Masanori YANO

AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction

by: Naoya Chiba

3D 3D reconstruction Pose estimation

Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation

by: Shingo Nakazawa

Multi modal Representation learning Vision and language Vision-and-language navigation Transformer

CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation

by: Ryuichi Nakahara

Vision and language Text-based person search Correlation filtering

A Simple and Robust Correlation Filtering Method for Text-Based Person Search

by: Shingo Nakazawa

P-STMO: Pre-trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation

by: ryuichi nakahara

Dataset GAN Super resolution

Any-Resolution Training for High-Resolution Image Synthesis

by: 加藤義道

Explicit Occlusion Reasoning for Multi-Person 3D Human Pose Estimation

by: Ryuichi Nakahara

Segmentation Semantic segmentation Vision and language

Extract Free Dense Labels from CLIP

by: 朝岡忠

DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation

by: Ryuichi Nakahara

3D Disentanglement Neural radiance fields (NeRF) Optical flow Video

PREF: Predictability Regularized Neural Motion Fields

by: Naoya Chiba

Delving into Details: Synopsis-to-Detail Networks for Video Recognition

by: Anonymous

3D Dataset N-shot learning Point cloud Representation learning Semantic segmentation Classification 3D compositional zero-shot learning Compositionality

3D Compositional Zero-Shot Learning with DeCompositional Consensus

by: Shingo Nakazawa

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification

by: 朝岡忠

N-shot learning Vision and language Classification

Knowledge Condensation Distillation

by: Hirokatsu Kataoka

Knowledge distillation Representation learning

Detecting Generated Images by Real Images

by: Hirokatsu Kataoka

GAN DeepFake

Self-Supervision Can Be a Good Few-Shot Learner

by: Hirokatsu Kataoka

Self supervised learning Few-shot Learning

Are Vision Transformers Robust to Patch Perturbations?

by: Shuya Takahashi（髙橋秀弥）

Adversarial examples Object detection Robustness

The Challenges of Continuous Self-Supervised Learning

by: Haruhi Shida

Self supervised learning Continuous Learning

HuMMan: Multi-modal 4D Human Dataset for Versatile Sensing and Modeling

by: Haruhi Shida

Action recognition Dataset Human Dataset Multi-modal

Custom Structure Preservation in Face Aging

by: 加賀屋智之

Multi modal Video Vision and language Dynamic visual graph Transformer VideoQA

Video Graph Transformer for Video Question Answering

by: Shingo Nakazawa

Fine-Grained Visual Entailment

by: Yuto Shinahara

Multi modal Vision and language

Solution Space Analysis of Essential Matrix Based on Algebraic Error Minimization

by: Masanori YANO

3D 3D reconstruction Essential matrix Stereo

Shape-Pose Disentanglement Using SE(3)-Equivariant Vector Neurons

by: Ryo Furukawa

3D 3D reconstruction Point cloud Pose estimation

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation

by: Tong Zheng

GAN Pixel Synthesis

Auto-FedRL: Federated Hyperparameter Optimization for Multi-Institutional Medical Image Segmentation

by: Tong Zheng

Federated learning Medical imaging

How Severe Is Benchmark-Sensitivity in Video Self-Supervised Learning?

by: Hirokatsu Kataoka

Action recognition Dataset Self supervised learning Video

Efficient One Pass Self-Distillation with Zipf’s Label Smoothing

by: Hirokatsu Kataoka

Knowledge distillation Vision and language

3D-Aware Indoor Scene Synthesis with Depth Priors

by: Hirokatsu Kataoka

Depth estimation GAN

Self-Filtering: A Noise-Aware Sample Selection for Label Noise with Confidence Penalization

by: Ryo Nakamura

Recognition Robustness

Physically-Based Editing of Indoor Scene Lighting from a Single Image

by: 角田良太朗

3D Lighting

Ghost-Free High Dynamic Range Imaging with Context-Aware Transformer

by: 角田良太朗

HDR

KD-MVS: Knowledge Distillation Based Self-Supervised Learning for Multi-View Stereo

by: 角田良太朗

Knowledge distillation MultiView Stereo

Learning Shadow Correspondence for Video Shadow Detection

by: 角田良太朗

Video Shadow Detection

Style-Guided Shadow Removal

by: 角田良太朗

Shadow Removal

ProposalContrast: Unsupervised Pre-training for LiDAR-Based 3D Object Detection

by: Kouhei Sekiguchi

3D object detection

Deep Bayesian Video Frame Interpolation

by: Tong Zheng

Video Video Frame Interpolation

AdaNeRF: Adaptive Sampling for Real-Time Rendering of Neural Radiance Fields

by: Naoya Chiba

3D Neural radiance fields (NeRF)

Active Learning Strategies for Weakly-Supervised Object Detection

by: Anonymous

Object detection Active Learning

Multi-Scale and Cross-Scale Contrastive Learning for Semantic Segmentation

by: Anonymous

Semantic segmentation

Active Label Correction Using Robust Parameter Update and Entropy Propagation

by: Anonymous

Active Label Correction

When Active Learning Meets Implicit Semantic Data Augmentation

by: Anonymous

Optical flow Semi-supervised Learning Active Learning

ActiveNeRF: Learning Where to See with Uncertainty Estimation

by: Anonymous

Neural radiance fields (NeRF) Active Learning

PSS: Progressive Sample Selection for Open-World Visual Representation Learning

by: Ryo Nakamura

Learning From Unlabeled Data Feature Pretraining

On Label Granularity and Object Localization

by: Ryo Nakamura

Dataset Recognition

Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining

by: Haruhi Shida

OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses

by: Shinnosuke Matsufusa

Robustness

Towards Generic 3D Tracking in RGBD Videos: Benchmark and Baseline

by: Haruhi Shida

3D Dataset 3D object tracking RGBD data

Unsupervised Learning of Efficient Geometry-Aware Neural Articulated Representations

by: Masanori YANO

3D 3D reconstruction GAN Neural radiance fields (NeRF) Representation learning Segmentation

Adaptive Token Sampling for Efficient Vision Transformers

by: Shinnosuke Matsufusa

Attention

Cross-Modal Knowledge Transfer without Task-Relevant Source Data

by: Shinnosuke Matsufusa

Depth estimation Multi modal

HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields

by: Naoya Chiba

3D object detection Neural radiance fields (NeRF)

Improving Robustness by Enhancing Weak Subnets

by: Shinnosuke Matsufusa

Adversarial examples Neural architecture search(NAS) Robustness

Frequency Domain Model Augmentation for Adversarial Attack

by: Tong Zheng

Adversarial examples Adversarial Attack

Real-Time Online Video Detection with Temporal Smoothing Transformers

by: Shinnosuke Matsufusa

Video

Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition

by: Shinnosuke Matsufusa

Recognition Robustness

SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image

by: Murakami

3D 3D reconstruction Neural radiance fields (NeRF) N-shot learning

Self-Slimmed Vision Transformer

by: Hirokatsu Kataoka

Attention Recognition Vision Transformer

SeqFormer: Sequential Transformer for Video Instance Segmentation

by: Shinnosuke Matsufusa

Segmentation Video

Privacy-Preserving Action Recognition via Motion Difference Quantization

by: Kensho Hara

Neighborhood Collective Estimation for Noisy Label Identification and Correction

by: Ryo Nakamura

Action recognition Dataset Video

Is Appearance Free Action Recognition Possible?

by: Kensho Hara

DeepShadow: Neural Shape from Shadow

by: Haruhi Shida

N-shot learning Shape from Shadow One-Shot Inverse Graphics

Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer

by: 綱島秀樹

GAN Video Video Generation

Frozen CLIP Models Are Efficient Video Learners

by: Kensho Hara

Action recognition Representation learning Video Vision and language

Generator Knows What Discriminator Should Learn in Unconditional GANs

by: Haruhi Shida

Robustness Segmentation Tracking

Large Scale Real-World Multi-person Tracking

by: Haruhi Shida

Dataset

Inpainting at Modern Camera Resolution by Guided PatchMatch with Auto-Curation

by: Haruhi Shida

Inpainting

Robust Visual Tracking by Segmentation

by: Haruhi Shida

Temporal and Cross-Modal Attention for Audio-Visual Zero-Shot Learning

by: 二川摩周

N-shot learning

Neural Density-Distance Fields

by: Masanori YANO

GAN Segmentation Video Video Generation Controllable Video Generation Unsupervised Segmentation

Layered Controllable Video Generation

by: 綱島秀樹

Learning Series-Parallel Lookup Tables for Efficient Image Super-Resolution

by: Tong Zheng

Dataset Segmentation Vision and language

SemAug: Semantically Meaningful Image Augmentations for Object Detection through Language Grounding

by: Yuto Shinahara

RegionCL: Exploring Contrastive Region Pairs for Self-Supervised Representation Learning

by: Fumiharu Suzuki

Action recognition Self supervised learning Video

Training Vision Transformers with Only 2040 Images

by: Ryo Nakamura

Few-shot Learning

Self-Supervised Social Relation Representation for Human Group Detection

by: Anonymous

Learning Mutual Modulation for Self-Supervised Cross-Modal Super-Resolution

by: Haruhi Shida

Cross-Modal Super-Resolution

Locality Guidance for Improving Vision Transformers on Tiny Datasets

by: Ryo Nakamura

Knowledge distillation Transfer learning

FedLTN: Federated Learning for Sparse and Personalized Lottery Ticket Networks

by: Haruhi Shida

Lottery Ticket Hypothesis Sparse and Personalized Federated Leraning

Quantized GAN for Complex Music Generation from Dance Videos

by: Haruhi Shida

GAN Multi-modal Complex Music Generation Vector Quantized Representation

Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields

by: Naoya Chiba

3D Multi modal Neural radiance fields (NeRF) Semantic segmentation

Cross-Domain Ensemble Distillation for Domain Generalization

by: Haruhi Shida

Domain adaptation Distillation Domain Generalization

CHORE: Contact, Human and Object REconstruction from a Single RGB Image

by: Haruhi Shida

REconstruction single RGB image

Deforming Radiance Fields with Cages

by: Masanori YANO

3D 3D reconstruction Neural radiance fields (NeRF) Scene manipulation Deformation

AutoAvatar: Autoregressive Neural Fields for Dynamic Avatar Modeling

by: Naoya Chiba

3D 3D reconstruction Point cloud Pose estimation Self supervised learning Video

Hierarchical Semi-Supervised Contrastive Learning for Contamination-Resistant Anomaly Detection

by: Anonymous

Representation learning Semi-supervised leanring Anomaly detection

Semi-Supervised Vision Transformers

by: Anonymous

Transformer Semi-supervised learning

Exploring Resolution and Degradation Clues As Self-Supervised Signal for Low Quality Object Detection

by: Masanori YANO

Object detection Self supervised learning Super resolution

What to Hide from Your Students: Attention-Guided Masked Image Modeling

by: Anonymous

Self supervised learning Transformer

MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration

by: Anonymous

Action recognition Dataset Multi modal Video Vision and language

Translating a Visual LEGO Manual to a Machine-Executable Plan

by: Tong Zheng

Inverse 3D modeling Pose estimation

C3P: Cross-Domain Pose Prior Propagation for Weakly Supervised 3D Human Pose Estimation

by: Ryuichi Nakahara

3D Knowledge distillation Neural radiance fields (NeRF)

R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis

by: Naoya Chiba

Rethinking Generic Camera Models for Deep Single Image Camera Calibration to Recover Rotation and Fisheye Distortion

by: Masanori YANO

Camera calibration Fisheye camera Rectification

Image Inpainting with Cascaded Modulation GAN and Object-Aware Training

by: Tong Zheng

Image Inpainting

DLME: Deep Local-Flatness Manifold Embedding

by: Ryuichi Nakahara

多様体埋め込み

Target-Absent Human Attention

by: Yuto Shinahara

Object detection Visual search

UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture

by: Masanori YANO

3D Dataset Pose estimation Fisheye camera Stereo

Multi-Domain Multi-Definition Landmark Localization for Small Datasets

by: 堤隆太

Attention Dataset Multi modal

CostDCNet: Cost Volume Based Depth Completion for a Single RGB-D Image

by: 角田良太朗

Depth estimation

Zero-Shot Learning for Reflection Removal of Single 360-Degree Image

by: Makoto Sasaki

N-shot learning reflection removal image synthesis

4DContrast: Contrastive Learning with Dynamic Correspondences for 3D Scene Understanding

by: Naoya Chiba

3D Meta learning Point cloud Representation learning Segmentation Semantic segmentation Self supervised learning Video

Interclass Prototype Relation for Few-Shot Segmentation

by: Masanori YANO

Semantic segmentation Few-shot learning Metric learning

PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks

by: Hiro Ishii

Transfer learning Pretrained model

Rotation Regularization without Rotation

by: Ryo Nakamura

Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization

by: Anonymous

Recognition Representation learning Explainable AI for CV

Understanding the Dynamics of DNNs Using Graph Modularity

by: Shuya Takahashi（髙橋秀弥）

MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment

by: 角田良太朗

Self supervised learning Bundle Adjustment

SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation

by: hisaka koji

Locally Varying Distance Transform for Unsupervised Visual Anomaly Detection

by: hisaka koji

DSR – A Dual Subspace Re-Projection Network for Surface Anomaly Detection

by: hisaka koji

DenseHybrid: Hybrid Anomaly Detection for Dense Open-Set Recognition

by: hisaka koji

Scale-Aware Spatio-Temporal Relation Learning for Video Anomaly Detection

by: hisaka koji

Dynamic Local Aggregation Network with Adaptive Clusterer for Anomaly Detection

by: hisaka koji

Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw Puzzles

by: hisaka koji

Towards Open Set Video Anomaly Detection

by: hisaka koji

Attention Object detection Transformer

Efficient Decoder-Free Object Detection with Transformers

by: Masanori YANO

Geometry-Guided Progressive NeRF for Generalizable and Efficient Neural Human Rendering

by: Naoya Chiba

ARM: Any-Time Super-Resolution Method

by: Tong Zheng

Depth estimation Robustness

Language-Driven Artistic Style Transfer

by: Tong Zheng

Style Transfer

Improving the Reliability for Confidence Estimation

by: Akihiro FUJII

Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning

by: Akihiro FUJII

Disentanglement GAN Representation learning Unsupervised learning

BlobGAN: Spatially Disentangled Scene Representations

by: 近藤佑樹 (Yuki Kondo)

Improving Vision Transformers by Revisiting High-Frequency Components

by: Hiroaki Aizawa

Recognition Representation learning Robustness

Controllable Video Generation through Global and Local Motion Dynamics

by: 綱島秀樹

Dataset Video Video Generation Conditional Video Generation Vector Quantization

WaveGAN: Frequency-Aware GAN for High-Fidelity Few-Shot Image Generation

by: Hiroaki Aizawa

GAN N-shot learning

Max Pooling with Vision Transformers Reconciles Class and Shape in Weakly Supervised Semantic Segmentation

by: Ryo Nakamura

Semantic segmentation

Dense Siamese Network for Dense Unsupervised Learning

by: Ryo Nakamura

NeRF for Outdoor Scene Relighting

by: Naoya Chiba

3D 3D reconstruction Dataset Neural radiance fields (NeRF) Pose estimation

Domain Adaptive Hand Keypoint and Pixel Localization in the Wild

by: Masanori YANO

Domain adaptation Knowledge distillation Segmentation Keypoint regression

SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds

by: Akihiro FUJII

3D 3D object detection Point cloud

Unsupervised Selective Labeling for More Effective Semi-Supervised Learning

by: Ryo Nakamura

N-shot learning Recognition

Objects Can Move: 3D Change Detection by Geometric Transformation Consistency

by: Haruhi Shida

3D object detection

Transform Your Smartphone into a DSLR Camera: Learning the ISP in the Wild

by: Haruhi Shida

Pre-training Image to Point Cloud

Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models

by: Haruhi Shida

Continual 3D Convolutional Neural Networks for Real-Time Processing of Videos

by: Haruhi Shida

3D CNN Human Activity Recognition Efficient Stream Processing Online Inference Continual Inference Network.

Online Segmentation of LiDAR Sequences: Dataset and Algorithm

by: Haruhi Shida

: LiDAR Transformer Autonomous Driving Real-Time Online Segmentation

CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution

by: Haruhi Shida

Super resolution Dynamic Quantization

Monocular 3D Object Reconstruction with GAN Inversion

by: Haruhi Shida

3D GAN Monocular 3D Object Reconstruction

Learning Efficient Multi-agent Cooperative Visual Exploration

by: Shingo Nakazawa

3D N-shot learning Multi-agent reinforcement learning Visual exploration

Dynamic 3D Scene Analysis by Point Cloud Accumulation

by: Haruhi Shida

3D reconstruction

Lidar Point Cloud Guided Monocular 3D Object Detection

by: Haruhi Shida

3D monocular 3D detection LiDAR point cloud self-driving

DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection

by: Haruhi Shida

3D object detection s: monocular 3D detection instance depth estimation

HULC: 3D HUman Motion Capture with Pose Manifold SampLing and Dense Contact Guidance

by: Ryuichi Nakahara

Vision and language zero-shot

Zero-Shot Temporal Action Detection via Vision-Language Prompting

by: Ryuichi Nakahara

CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition

by: Ryuichi Nakahara

Not Just Streaks: Towards Ground Truth for Single Image Deraining

by: Masanori YANO

Dataset Image deraining

Zero-Shot Category-Level Object Pose Estimation

by: Ryuichi Nakahara

3D Attention Representation learning Self supervised learning

MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis

by: Naoya Chiba

Semantic-Guided Multi-Mask Image Harmonization

by: Shingo Nakazawa

Dataset GAN Multi-mask image harmonization Image editing

MVSalNet:Multi-View Augmentation for RGB-D Salient Object Detection

by: Anonymous

Multi modal

Spatial-Frequency Domain Information Integration for Pan-Sharpening

by: 角田良太朗

Super resolution Pan Sharpening

Deep Fourier-Based Exposure Correction Network with Spatial-Frequency Interaction

by: 角田良太朗

Exposure Correction Enhancement Retouching

UC-OWOD: Unknown-Classified Open World Object Detection

by: Anonymous

Weakly Supervised Object Localization through Inter-class Feature Similarity and Intra-Class Appearance Consistency

by: Ryo Nakamura

3D Multi modal Neural radiance fields (NeRF) N-shot learning Video

Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis

by: Naoya Chiba

Learning to Detect Every Thing in an Open World

by: 朝岡忠

Instance segmentation Data augmentation

Self-Regulated Feature Learning via Teacher-Free Feature Distillation

by: Hiroaki Aizawa

Knowledge distillation Representation learning

Unknown-Oriented Learning for Open Set Domain Adaptation

by: Ryo Nakamura

Domain adaptation

Constructing Balance from Imbalance for Long-Tailed Image Recognition

by: 鈴木共生

Recognition Self supervised learning Data Imbalance

OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images

by: Anonymous

Dataset Recognition

BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-Scale Scene Rendering

by: Naoya Chiba

3D Neural radiance fields (NeRF)

Deep Ensemble Learning by Diverse Knowledge Distillation for Fine-Grained Object Classification

by: Masanori YANO

Attention Knowledge distillation Recognition Ensemble learning

Neural Radiance Transfer Fields for Relightable Novel-View Synthesis with Global Illumination

by: Naoya Chiba

3D 3D reconstruction Disentanglement Neural radiance fields (NeRF)

Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications

by: Masanori YANO

Dataset Segmentation

OneFace: One Threshold for All

by: Yasunobu Ogura

Recognition Face Multi Domain

FairStyle: Debiasing StyleGAN2 with Style Channel Manipulations

by: 加藤義道

Disentanglement GAN Fairness

RFNet-4D: Joint Object Reconstruction and Flow Estimation from 4D Point Clouds

by: Anonymous

3D reconstruction Point cloud

Vote from the Center: 6 DoF Pose Estimation in RGB-D Images by Radial Keypoint Voting

by: Kayo Okuda

Self supervised learning Correspondence Learning

Semantic-Aware Fine-Grained Correspondence

by: 角田良太朗

In Defense of Online Models for Video Instance Segmentation

by: 角田良太朗

Instance segmentation Video

VecGAN: Image-to-Image Translation with Interpretable Latent Directions

by: 加藤義道

Disentanglement GAN

IntereStyle: Encoding an Interest Region for Robust StyleGAN Inversion

by: 加藤義道

Disentanglement GAN

Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis

by: Naoya Chiba

3D Disentanglement GAN Neural radiance fields (NeRF) Representation learning Self supervised learning

Look Both Ways: Self-Supervising Driver Gaze Estimation and Road Scene Saliency

by: Masanori YANO

3D Dataset Self supervised learning Gaze estimation ADAS

ECLIPSE: Efficient Long-Range Video Retrieval Using Sight and Sound

by: abeT

Attention Multi modal Video Vision and language Video Retrieval

StyleSwap: Style-Based Generator Empowers Robust Face Swapping

by: 加藤義道

GAN Video Deepfake

k-Means Mask Transformer

by: Ryo Nakamura

Attention Semantic segmentation

Visual Prompt Tuning

by: Kodai Nakashima

Decoupled Contrastive Learning

by: Kodai Nakashima

Geometric Representation Learning for Document Image Rectification

by: Anonymous

Learning from Multiple Annotator Noisy Labels via Sample-Wise Label Fusion

by: Anonymous

Action recognition Attention

Are Vision Transformers Robust to Patch Perturbations?

by: Shuya Takahashi（髙橋秀弥）

Explainable AI

DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition

by: Kodai Nakashima

Sound-Guided Semantic Video Generation

by: Anonymous

GAN Video

MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes

by: Anonymous

Point cloud Vision and language

High-Fidelity GAN Inversion with Padding Space

by: 加藤義道

Adversarial examples Recognition

Out-of-Distribution Detection with Boundary Aware Learning

by: Ryo Nakamura

Latent Discriminant Deterministic Uncertainty

by: Kodai Nakashima

Uncertainty estimation

Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding

by: Anonymous

3D Vision and language

Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation

by: Anonymous

N-shot learning Segmentation Semantic segmentation

HM: Hybrid Masking for Few-Shot Segmentation

by: Anonymous

N-shot learning Segmentation Semantic segmentation

Feature Representation Learning for Unsupervised Cross-Domain Image Retrieval

by: 田所龍

Domain adaptation Representation learning

Few-Shot Object Detection with Model Calibration

by: Anonymous

N-shot learning

Towards Realistic Semi-Supervised Learning

by: 田所龍

Semi-supervised learning Open world problem

MoDA: Map Style Transfer for Self-Supervised Domain Adaptation of Embodied Agents

by: 朝岡忠

Domain adaptation Self supervised learning Visual navigation

PrivHAR: Recognizing Human Actions from Privacy-Preserving Lens

by: Haruhi Shida

Adversarial examples Robustness Privacy-preserving lens design human action recognition (HAR) adversarial training deep optics

A Dense Material Segmentation Dataset for Indoor and Outdoor Scene Parsing

by: Haruhi Shida

Dataset Segmentation Material Segmentation Indoor Outdoor

Expanding Language-Image Pretrained Models for General Video Recognition

by: Haruhi Shida

Recognition Video Video Recognition Contrastive Language-Image Pretraining

UIA-ViT: Unsupervised Inconsistency-Aware Method Based on Vision Transformer for Face Forgery Detection

by: Haruhi Shida

ViT Unsupervised Face Forgery Detection

TensoRF: Tensorial Radiance Fields

by: Naoya Chiba

3D Neural radiance fields (NeRF)

SimCC: A Simple Coordinate Classification Perspective for Human Pose Estimation

by: Anonymous

Estimating Spatially-Varying Lighting in Urban Scenes with Disentangled Representation

by: Anonymous

Spatially-varying lighting Disentangled representation Lighting estimation Urban scene

Contributions of Shape, Texture, and Color in Visual Recognition

by: Shuya Takahashi（髙橋秀弥）

Recognition Explainable AI

Spectrum-Aware and Transferable Architecture Search for Hyperspectral Image Restoration

by: Kikuchi Hinata

Neural architecture search(NAS) Hyperspectral Image

Adversarially-Aware Robust Object Detector

by: Haruhi Shida

Robustness Object Detection Adversarial Attack and Defense Adversarial Robustness Detection Robustness Bottleneck

3D Object Detection with a Self-Supervised Lidar Scene Flow Backbone

by: Anonymous

3D object detection 3D detection self-supervised learning scene flow lidar point clouds

Discovering Transferable Forensic Features for CNN-Generated Images Detection

by: Anonymous

CNN-generated Images Detection Transferable Forensic Features

Pointly-Supervised Panoptic Segmentation

by: Anonymous

Segmentation weakly-supervised learning panoptic segmentation

GeoAug: Data Augmentation for Few-Shot NeRF with Geometry Constraints

by: Murakami

3D 3D reconstruction Depth estimation Neural radiance fields (NeRF) N-shot learning

FH-Net: A Fast Hierarchical Network for Scene Flow Estimation on Real-World Point Clouds

by: Anonymous

Scene flow Real-world point cloud Transformer Copy-and-paste

Unsupervised Segmentation in Real-World Images via Spelke Object Inference

by: Haruhi Shida

Segmentation Unsupervised Segmentation Real-world images

ObjectBox: From Centers to Boxes for Anchor-Free Object Detection

by: Masanori YANO

Object detection Person re-identification Recognition Video

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

by: yoshiki miyazawa

Online Domain Adaptation for Semantic Segmentation in Ever-Changing Conditions

by: Anonymous

Domain adaptation Semantic segmentation Online Domain Adaptation

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

by: いのいち

3D object detection Attention Object detection Recognition Segmentation

POP: Mining POtential Performance of New Fashion Products via Webly Cross-Modal Query Expansion

by: Anonymous

Recognition Computer Vision for Fashion Data-centric Artificial Intelligence Time Series Forecasting

DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection

by: Anonymous

3D object detection Equivariance Projective manifold Monocular 3D detection

Super-Resolution 3D Human Shape from a Single Low-Resolution Image

by: Anonymous

Super-resolution Single Low-Resolution Image

Unsupervised and Semi-Supervised Bias Benchmarking in Face Recognition

by: 鈴木共生

Recognition Data Imbalance

Generative Negative Text Replay for Continual Vision-Language Pretraining

by: Seitaro Shinagawa

Knowledge distillation Vision and language

Spatio-Temporal Deformable Attention Network for Video Deblurring

by: Tong Zheng

Attention Optical flow Video Video deblurring

Delving into Universal Lesion Segmentation: Method, Dataset, and Benchmark

by: Tong Zheng

Medical imaging Sparse coding

StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation

by: Seitaro Shinagawa

Dynamic Network Medical imaging

Med-DANet: Dynamic Architecture Network for Efficient Medical Volumetric Segmentation

by: Tong Zheng

ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images

by: Tong Zheng

Representation learning Self supervised learning Pathology image

TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation

by: Seitaro Shinagawa