ECCV2020論文サマリ

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

by: Shintaro Yamamoto

Object-and-Action Aware Model for Visual Language Navigation

by: Yue Qiu

3D 3D object detection Multi modal Vision and language

ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

by: Yue Qiu

Active Visual Information Gathering for Vision-Language Navigation

by: Yue Qiu

3D 3D reconstruction Multi modal Vision and language

Environment-agnostic Multitask Learning for Natural Language Grounded Navigation

by: Yue Qiu

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

by: Yue Qiu

3D 3D reconstruction Point cloud Super resolution

Convolutional Occupancy Networks

by: Naoya Chiba

Curriculum DeepSDF

by: Naoya Chiba

3D 3D reconstruction Point cloud Representation learning

Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs

by: Seitaro Shinagawa

3D Knowledge distillation Multi modal Vision and language

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler

by: Yue Qiu

3D Vision and language

Soft Expert Reward Learning for Vision-and-Language Navigation

by: Yue Qiu

Multi-Agent Embodied Question Answering in Interactive Environments

by: Shintaro Yamamoto

3D reconstruction Vision and language

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

by: Yue Qiu

Segmentation Semantic segmentation

Document Structure Extraction using Prior based High Resolution Hierarchical Semantic Segmentation

by: Shintaro Yamamoto

DTVNet: Dynamic Time-lapse Video Generation via Single Still Image

by: Yukitaka Tsuchiya

GAN Video

CPGAN: Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis

by: Keisuke Kamahori

GAN Multi modal Vision and language

Connecting Vision and Language with Localized Narratives

by: Keisuke Kamahori

3D 3D reconstruction Point cloud Super resolution

Points2Surf Learning Implicit Surfaces from Point Clouds

by: Naoya Chiba

Progressive Point Cloud Deconvolution Generation Network

by: Naoya Chiba

3D Point cloud Super resolution

Texture Hallucination for Large-Factor Painting Super-Resolution

by: Teppei Kurita

Dataset Super resolution

Polarimetric Multi-View Inverse Rendering

by: Teppei Kurita

3D reconstruction Polarization

Table Structure Recognition using Top-Down and Bottom-Up Cues

by: Shintaro Yamamoto

Document recognition

Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D

by: Naoya Chiba

3D Depth estimation Multi modal Point cloud Segmentation Semantic segmentation

Meshing Point Clouds with Predicted Intrinsic-Extrinsic Ratio Guidance

by: Naoya Chiba

3D Adversarial examples Vision and language

Spatiotemporal Attacks for Embodied Agents

by: Yue Qiu

A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks

by: Yue Qiu

SoundSpaces: Audio-Visual Navigation in 3D Environments

by: Yue Qiu

3D Vision and language

Learning Object Relation Graph and Tentative Policy for Visual Navigation

by: Yue Qiu

Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder

by: Shintaro Yamamoto

Adaptive Offline Quintuplet Loss for Image-Text Matching

by: Shintaro Yamamoto

Multi modal Recognition Vision and language

Character Grounding and Re-Identification in Story of Videos and Text Descriptions

by: Keisuke Kamahori

SODA: Story Oriented Dense Video Captioning Evaluation Framework

by: Keisuke Kamahori

Video Vision and language

LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities

by: Shun.ishizaka

Action recognition Dataset Video Vision and language

Semantic Curiosity for Active Visual Learning

by: Shun.ishizaka

Object detection Active visual learning

MovieNet: A Holistic Dataset for Movie Understanding

by: Shun.ishizaka

Dataset Person re-identification Video Scene understanding

Lifespan Age Transformation Synthesis

by: Teppei Kurita

Dataset GAN Age Transformation

3D Human Shape Reconstruction from a Polarization Image

by: Teppei Kurita

3D reconstruction Pose estimation Polarization

GRNet: Gridding Residual Network for Dense Point Cloud Completion

by: Naoya Chiba

Instance segmentation Object detection Framework Evaluation

TIDE: A General Toolbox for Identifying Object Detection Errors

by: Ryota Suzuki

PIPAL: a Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration

by: Teppei Kurita

Dataset GAN Quality Assessment

Image Classification in the Dark using Quanta Image Sensors

by: Teppei Kurita

Knowledge distillation Recognition Quanta Image Sensor

Occupancy Anticipation for Efficient Exploration and Navigation

by: Yue Qiu

3D 3D reconstruction Multi modal Vision and language

Are Labels Necessary for Neural Architecture Search?

by: Hirokatsu Kataoka

Neural architecture search(NAS) Self supervised learning

Learning to Learn Words from Visual Scenes

by: Shintaro Yamamoto

Meta learning Vision and language

Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions

by: Naoya Chiba

3D Point cloud Representation learning Semantic segmentation Self supervised learning

Foley Music: Learning to Generate Music from Videos

by: Yukitaka Tsuchiya

Multi modal Sound Audio

Towards Unique and Informative Captioning of Images

by: Keisuke Kamahori

Representation learning Vision and language

Learning Visual Representations with Caption Annotations

by: Keisuke Kamahori

PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding

by: Hirokatsu Kataoka

3D object detection Point cloud Representation learning

VisualEchoes: Spatial Image Representation Learning through Echolocation

by: Masuyama Yoshiki

3D Depth estimation Multi modal Self supervised learning Audio and visual

Scene Text Image Super-resolution in the wild

by: Keisuke Kamahori

Action recognition Human-object interaction

Asynchronous Interaction Aggregation for Action Detection

by: Shun.ishizaka

AdvPC: Transferable Adversarial Perturbations on 3D Point Clouds

by: Naoya Chiba

3D Adversarial examples Point cloud

Filter Style Transfer between Photos

by: Teppei Kurita

Style Transfer

Weakly Supervised 3D Object Detection from Lidar Point Cloud

by: Naoya Chiba

3D 3D object detection N-shot learning Point cloud Pose estimation

Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision

by: Teppei Kurita

Dataset Recognition Video Violence Detection

Learning Joint Spatial-Temporal Transformations for Video Inpainting

by: Yukitaka Tsuchiya

GAN Video Inpainting

Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting

by: Keisuke Kamahori

Recognition Robustness Vision and language

We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos

by: Shun.ishizaka

Representation learning Video Vision and language Video understanding

An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds

by: Naoya Chiba

3D 3D object detection Attetion Object detection Point cloud

Streaming Object Detection for 3-D Point Clouds

by: Naoya Chiba

3D 3D object detection Object detection Point cloud

TSIT: A Simple and Versatile Framework for Image-to-Image Translation

by: Teppei Kurita

Image-to-Image Translation

ScribbleBox: Interactive Annotation Framework for Video Object Segmentation

by: Shun.ishizaka

Dataset Segmentation Annotation Interactive segmentation

Feature-metric Loss for Self-supervised Learning of Depth and Egomotion

by: Shoji Sonoyama

Depth estimation Self supervised learning

PUGeo-Net: A Geometry-centric Network for 3D Point Cloud Upsampling

by: Naoya Chiba

3D Point cloud Super resolution

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

by: Keisuke Kamahori

Generating Handwriting via Decoupled Style Descriptors

by: Keisuke Kamahori

Attetion Multi modal Self supervised learning Audio and visual

Self-Supervised Learning of Audio-Visual Objects from Video

by: Masuyama Yoshiki

PhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and Click

by: Yue Qiu

Semantic segmentation Vision and language

PhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and Click

by: Yue Qiu

N-shot learning Vision and language

Visual Question Answering on Image Sets

by: Yue Qiu

3D 3D reconstruction Dataset Vision and language

Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards

by: Keisuke Kamahori

Object detection Recognition

Single-Image Depth Prediction Makes Feature Matching Easier

by: Teppei Kurita

Dataset Feature Matching

Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling

by: Teppei Kurita

3D Dataset

Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild

by: Hiroaki Aizawa

Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors

by: Hiroaki Aizawa

Knowledge distillation N-shot learning Recognition

Rethinking Few-shot Image Classification: A Good Embedding is All You Need?

by: Hiroaki Aizawa

Shape and Viewpoint without Keypoints

by: Hiroaki Aizawa

Meta learning N-shot learning Self supervised learning

When Does Self-supervision Improve Few-shot Learning?

by: Hiroaki Aizawa

Length-Controllable Image Captioning

by: Keisuke Kamahori

3D Pose estimation Camera calibration

Infrastructure-based Multi-Camera Calibration using Radial Projections

by: Shoji Sonoyama

Comprehensive Image Captioning via Scene Graph Decomposition

by: Keisuke Kamahori

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

by: Keisuke Kamahori

Attetion Dataset Segmentation

Segmenting Transparent Objects in the Wild

by: Teppei Kurita

Unselfie: Translating Selfies to Neutral-pose Portraits in the Wild

by: Teppei Kurita

Pose estimation Unselfie

VQA-LOL: Visual Question Answering under the Lens of Logic

by: Yue Qiu

Attetion Vision and language

TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering

by: Yue Qiu

A Broader Study of Cross-Domain Few-Shot Learning

by: Hiroaki Aizawa

N-shot learning

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

by: Keisuke Kamahori

DR-KFS: A Differentiable Visual Similarity Metric for 3D Shape Reconstruction

by: Naoya Chiba

3D 3D reconstruction Self supervised learning

Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction

by: Naoya Chiba

PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit

by: Keisuke Kamahori

N-shot learning Vision and language

Adaptive Text Recognition through Visual Matching

by: Keisuke Kamahori

Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering

by: Yue Qiu

Adversarial examples Vision and language

Defocus Blur Detection via Depth Distillation

by: Teppei Kurita

Depth estimation Knowledge distillation Defocus Blur Detection

RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects

by: Teppei Kurita

3D object detection Object detection Radar LiDAR

Learning Graph-Convolutional Representations for Point Cloud Denoising

by: Naoya Chiba

3D Point cloud Self supervised learning

JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds

by: Naoya Chiba

3D Point cloud Segmentation Semantic segmentation

RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving

by: Yue Qiu

3D 3D object detection Object detection

Caption-Supervised Face Recognition: Training a State-of-the-Art Face Model without Manual Annotation

by: Keisuke Kamahori

Dataset Multi modal Vision and language

Contrastive Learning for Unpaired Image-to-Image Translation

by: Teppei Kurita

Contrastive Learning Image-to-Image Translation Taesung

Wavelet-Based Dual-Branch Network for Image Demoiréing

by: Teppei Kurita

Dataset Wavelet Demoire

Captioning Images Taken by People Who Are Blind

by: Yue Qiu

Object detection Vision and language

Learning to Generate Grounded Visual Captions without Localization Supervision

by: Yue Qiu

Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents

by: Yue Qiu

Video Vision and language

Self-Prediction for Joint Instance and Semantic Segmentation of Point Clouds

by: Naoya Chiba

3D Point cloud Representation learning Segmentation Semantic segmentation

A Closer Look at Local Aggregation Operators in Point Cloud Analysis

by: Naoya Chiba

Reflection Separation Polarization

Reflection Separation via Multi-bounce Polarization State Tracing

by: Teppei Kurita

Low Light Video Enhancement using Synthetic Data Produced with an Intermediate Domain Mapping

by: Teppei Kurita

Domain adaptation Video Enhancement

Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

by: Yue Qiu

Self supervised learning Video

Learning Predictive Models from Observation and Interaction

by: Yue Qiu

Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions

by: Keisuke Kamahori

Multi modal Video Vision and language

Discrete Point Flow Networks for Efficient Point Cloud Generation

by: Naoya Chiba

3D Point cloud Representation learning

Iterative Distance-Aware Similarity Matrix Convolution with Mutual-Supervised Point Elimination for Efficient Point Cloud Registration

by: Naoya Chiba

3D Point cloud Pose estimation

Erasing Appearance Preservation in Optimization-based Smoothing

by: Teppei Kurita

Optimization-based Smoothing

Rethinking Image Deraining via Rain Streaks and Vapors

by: Teppei Kurita

Deraining

Detail Preserved Point Cloud Completion via Separated Feature Aggregation

by: Naoya Chiba

3D 3D object detection Object detection Point cloud Robustness

SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds

by: Naoya Chiba

Guessing State Tracking for Visual Dialogue

by: Seitaro Shinagawa

Multi modal Video Vision and language

Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language

by: Keisuke Kamahori

Identity-Aware Multi-Sentence Video Description

by: Keisuke Kamahori

Video Vision and language

Burst Denoising via Temporally Shifted Wavelet Transforms

by: Teppei Kurita

Wavelet Transforms Burst Denoising

From Shadow Segmentation to Shadow Removal

by: Teppei Kurita

Segmentation Shadow Removal

RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition

by: Keisuke Kamahori

3D 3D object detection Point cloud Segmentation Semantic segmentation

Orderly Disorder in Point Cloud Domain

by: Naoya Chiba

PointMixup: Augmentation for Point Clouds

by: Naoya Chiba

3D 3D object detection Point cloud

Calibration-free Structure-from-Motion with Calibrated Radial Trifocal Tensors

by: Shoji Sonoyama

Neural architecture search(NAS) Vision and language

AutoSTR: Efficient Backbone Search for Scene Text Recognition

by: Keisuke Kamahori

An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension

by: Keisuke Kamahori

Learning to Scale Multilingual Representations for Vision-Language Tasks

by: Shintaro Yamamoto

Optical Character Recognition

LEED: Label-Free Expression Editing via Disentanglement

by: Teppei Kurita

Expression Editing

Can You Read Me Now? Content Aware Rectification using Angle Supervision

by: Teppei Kurita

Neural Object Learning for 6D Pose Estimation Using a Few Cluttered Images

by: Yue Qiu

Active Perception using Light Curtains for Autonomous Driving

by: Yue Qiu

3D Attetion Point cloud Representation learning

Mapping in a Cycle: Sinkhorn Regularized Unsupervised Learning for Point Cloud Shapes

by: Naoya Chiba

FLOT: Scene Flow on Point Clouds guided by Optimal Transport

by: Naoya Chiba

3D Point cloud

Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation

by: Yukitaka Tsuchiya

Audio Binaural U-Net

Nighttime Defogging Using High-Low Frequency Decomposition and Grayscale-Color Networks

by: Teppei Kurita

Defogging

GeLaTO: Generative Latent Textured Objects

by: Teppei Kurita

3D Representation learning

Multimodal Shape Completion via Conditional Generative Adversarial Networks

by: Yue Qiu

3D GAN

Impact of base dataset design on few-shot image classification

by: Hiroaki Aizawa

N-shot learning

Sound2Sight: Generating Visual Dynamics from Sound and Context

by: Yukitaka Tsuchiya

GAN Multi modal Video Sound Transformer

PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations

by: Naoya Chiba

3D 3D reconstruction Point cloud Representation learning Self supervised learning

Dual Grid Net: Hand Mesh Vertex Regression from Single Depth Maps

by: Naoya Chiba

3D 3D reconstruction Depth estimation Point cloud Pose estimation Self supervised learning

Learning to See in the Dark with Events

by: Teppei Kurita

High Dynamic Range Event Camera Low Light Imaging

Learning to Factorize and Relight a City

by: Teppei Kurita

Relighting Factorization

I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image

by: Naoya Chiba

3D 3D reconstruction Pose estimation

Unsupervised Shape and Pose Disentanglement for 3D Meshes

by: Naoya Chiba

3D 3D reconstruction Disentanglement Pose estimation Representation learning

Attentive Normalization

by: Teppei Kurita

Attetion Normalization

Dynamic Low-light Imaging with Quanta Image Sensors

by: Teppei Kurita

Knowledge distillation Quanta Image Sensor Noise Reduction

Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose

by: Naoya Chiba

3D Pose estimation

DEMEA: Deep Mesh Autoencoders for Non-Rigidly Deforming Objects

by: Naoya Chiba

3D 3D reconstruction Representation learning

Improving Semantic Segmentation via Decoupled Body and Edge Supervision

by: Teppei Kurita

Semantic segmentation

Generative Sparse Detection Networks for 3D Single-shot Object Detection

by: Yue Qiu

Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve

by: Yue Qiu

Hallucinating Visual Instances in Total Absentia

by: Yue Qiu

GAN

Sequential Deformation for Accurate Scene Text Detection

by: Keisuke Kamahori

Dataset Video Vision and language

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval

by: Keisuke Kamahori

SeqXY2SeqZ: Structure Learning for 3D Shapes by Sequentially Predicting 1D Occupancy Segments From 2D Coordinates

by: Naoya Chiba

3D 3D reconstruction Attetion Representation learning Self supervised learning

Polarized Optical-Flow Gyroscope

by: Teppei Kurita

Optical flow Polarization

Full-Time Monocular Road Detection Using Zero-Distribution Prior of Angle of Polarization

by: Teppei Kurita

Road Detection Polarization

Image-based table recognition: data, model, and evaluation

by: Keisuke Kamahori

Learning Joint Visual Semantic Matching Embeddings for Language-guided Retrieval

by: Keisuke Kamahori

Relighting SVBRDF Inverse Rendering

Single-Shot Neural Relighting and SVBRDF Estimation

by: Teppei Kurita

Linguistic Structure Guided Context Modeling for Referring Image Segmentation

by: Yue Qiu

Instance segmentation Vision and language

Learning to Plan with Uncertain Topological Maps

by: Yue Qiu

3D

Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation

by: Naoya Chiba

3D 3D object detection Object detection Point cloud Pose estimation

TexMesh: Reconstructing Detailed Human Texture and Geometry from RGB-D Video

by: Naoya Chiba

3D 3D reconstruction Representation learning Self supervised learning Video

EfficientFCN: Holistically-guided Decoding for Semantic Segmentation

by: Atsuki Osanai

Semantic segmentation

A Generic Visualization Approach for Convolutional Neural Networks

by: Teppei Kurita

Attetion Retrieval

Colorization of Depth Map via Disentanglement

by: Teppei Kurita

Depth Colorization

Surface Normal Estimation of Tilted Images via Spatial Rectifier

by: Teppei Kurita

Surface Normal

Binarized Neural Network for Single Image Super Resolution

by: Teppei Kurita

Super resolution Binarized Neural Network

Grounded Situation Recognition

by: Masanori YANO

Object detection Recognition Vision and language

Tracking Objects as Points

by: Yue Qiu

Object detection

Large Scale Holistic Video Understanding

by: Yue Qiu

3D Multi modal Video

Spatially Aware Multimodal Transformers for TextVQA

by: Yue Qiu

3D 3D object detection Point cloud

SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans

by: Yue Qiu

Multiview Detection with Feature Perspective Transformation

by: Hayata Ebisawa

Object detection Recognition

Semantic View Synthesis

by: 古川遼

GAN Semantic segmentation Novel view synthesis

Dive Deeper Into Box for Object Detection

by: Masanori YANO

Object detection

Dynamic ReLU

by: Masanori YANO

Recognition

Joint 3D Layout and Depth Prediction from a Single Indoor Panorama Image

by: Teppei Kurita

3D Depth estimation Layout Prediction

Traffic Accident Benchmark for Causality Recognition

by: Masanori YANO

Dataset Recognition Video

Task-Aware Quantization Network for JPEG Image Compression

by: Teppei Kurita

JPEG Image Compression

DELTAS: Depth Estimation by Learning Triangulation And densification of Sparse points

by: Naoya Chiba

3D 3D reconstruction Depth estimation

Occlusion-Aware Depth Estimation with Adaptive Normal Constraints

by: Naoya Chiba

3D 3D object detection Depth estimation Video

Dual Refinement Underwater Object Detection Network

by: Teppei Kurita

Object detection Under Water

Hierarchical Kinematic Human Mesh Recovery

by: Naoya Chiba

3D Pose estimation

P²Net: Patch-match and Plane-regularization for Unsupervised Indoor Depth Estimation

by: Naoya Chiba

3D Depth estimation Self supervised learning Video

Rethinking Bottleneck Structure for Efficient Mobile Network Design

by: Masanori YANO

Recognition

BCNet: Learning Body and Cloth Shape from A Single Image

by: Teppei Kurita