summarized by : Anonymous
Recurrent Glimpse-Based Decoder for Detection With Transformer


reduce the training difficulty of DETR by adding recurrent glimpse-based decoders (REGO) after DETR decoders, REGO computes attention only in the enlarged RoI regions from the detected boxes of DETR


RoI based refinement first time applied for attention model to achieve effective sparse attention; Queries learned only from features inside enlarged RoI, key and values learned from decoder features


generally improves the training speed of DETR variants (e.g., DETR and Deformable DETR) and accuracy


effective locality modeling is important for reducing the training difficulty of attention in DETR