On June 19th, CVPR 2022 will be held soon.At this year's conference, the AMD AI R&D team from Beijing was once again selected for two papers - "Dynamic Sparse R-CNN" and "Dual Intersection Attention Learning for Fine-grained Visual Classification and Target Re-identification" (attached at the end of the paper download link).
This is the third consecutive year that the team has papers selected for CVPR, a top academic conference.Relying on cutting-edge research in the field of computer vision algorithms, the AMD AI R&D team continues to provide innovative application value for industries such as autonomous driving, smart cities, smart security, and smart retail, helping many customers served by the team to create leading Vitis AI full-stack solutions Program. STM8L101F3U6TR
Hong Qinghang, AMD Algorithm Intern (left)
Liu Fengming, AMD Algorithm Intern (right)
Sparse R-CNN is a recent high-performance object detector that predicts learnable sparse candidate boxes and candidate features without anchors or reference points.In the paper, the team proposed two dynamic strategies to improve Sparse R-CNN.
• First, Sparse R-CNN uses a one-to-one label assignment scheme, that is, uses the Hungarian algorithm to assign a positive sample to each GT. Therefore, we propose a dynamic label assignment algorithm, which assigns each GT Multiple positive samples, and gradually increase the number of positive samples in the cascade structure.
• Second, during inference, Sparse R-CNN keeps the initial candidate boxes and features unchanged for different input images.Inspired by dynamic convolution, this paper proposes a dynamic candidate box generation algorithm, which optimizes the initial candidate box by dynamically combining multiple generators.
Experiments show that the Dynamic Sparse R-CNN detector proposed by the team can further improve the performance of Sparse R-CNN. Using ResNet-50 as the backbone network, Dynamic Sparse R-CNN obtained 47.2% AP on the COCO verification set, greatly surpassing Sparse R. -CNN 2.2 percentage points.
⬅️ Swipe left and right to see more pictures ➡️
⬇️Swipe up and down to see more content⬇️
This innovative approach offers numerous advantages:
• Can inherit the advantages of Sparse R-CNN, such as using learnable sparse candidate boxes, self-attention mechanism to associate different candidate boxes, etc.;
• Most mainstream Transformer based detectors use a one-to-one label assignment scheme, that is, use the Hungarian algorithm to assign a positive sample to each GT, which may not be efficient enough for optimizing the detector.Inspired by the optimal transfer theory, the method in this paper assigns multiple positive samples to each GT, and gradually increases the number of positive samples in the cascade structure.Experiments show that this dynamic label assignment algorithm can bring significant accuracy improvements.
• The candidate frame set learned by Sparse R-CNN represents the statistical information of the possible location of the training set target, which is fixed for the specific test image.Inspired by dynamic convolution, this method optimizes the initial candidate box by dynamically combining multiple generators, so that the candidate box is adaptive for different input images, thereby further improving the detection accuracy.
As we all know, target detection, as a basic task in computer vision, has important applications in areas such as autonomous driving and smart cities.The algorithm proposed in this paper has achieved world-leading accuracy in a detector using only a single ResNet50 as a backbone, and achieved 47.2% AP on the COCO verification set, surpassing the Sparse R-CNN baseline model by 2.2 percentage points, and surpassing other CNN-based and Transformer-based mainstream detectors.
for fine-grained visual classification and object re-identification
Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification
Zhu Haowei, AMD Algorithm Intern (left)
Ke Wenjing, AMD algorithm engineer (right)
Fine-grained recognition and object re-identification are important tasks in smart cities.The purpose of fine-grained recognition is to distinguish different subcategories of objects under the same category, while the target re-identification task needs to identify the same pedestrian or vehicle target identity across cameras.Since there are only subtle visual differences between different subclasses, fine-grained recognition and object re-identification are more challenging than general image classification tasks.
Recently, the Transformer model based on the self-attention mechanism (Self-Attention) has shown superior performance in various NLP and CV tasks.Self-attention can capture sequential features and obtain global information.In this work, we design a Dual Cross-Attention Learning method for fine-grained recognition and object re-identification tasks to better learn subtle features to recognize fine-grained objects, such as different bird species or pedestrian ID.
The dual cross-attention proposed in this paper includes global-local cross-attention and pair-wise cross-attention.
• Global-local cross-attention through the interaction between global features and local saliency features: learning local saliency features after locating salient regions helps to enhance the learning ability of discriminative features in space.
• Pairwise cross-attention establishes the interaction of features between image pairs: by mixing the features of the interference picture into the learning of the target image features, it plays a role of regularization and can effectively alleviate over-fitting.
The method of this paper has obvious accuracy improvement on the mainstream data sets of fine-grained classification and target re-identification. For example, compared with the Deit-Tiny and ViT-Base baseline methods on the MSMT17 data set, it has improved by 2.8% and 2.4%. mAP.
⬅️ Swipe left and right to see more pictures ➡️
⬇️Swipe up and down to see more content⬇️
The unique value of this thesis research lies in:
• The proposed method has carried out a large number of effectiveness verification experiments based on different baseline methods on mainstream fine-grained recognition and target re-identification datasets.The proposed method improves on different baseline methods and achieves the best performance compared to existing methods.
• Compared with the traditional self-attention method, this method innovatively proposes two kinds of cross-attention modules, which help the model to mine the discrimination in the image through the feature interaction between the global-local and paired images during the training process and complementary features.
• The proposed cross-attention module is easy to implement and compatible with existing visual Transformer baseline methods.
In addition, the method proposed in this paper can be applied to tasks such as fine-grained classification, pedestrian re-identification, and vehicle re-identification, and has application value in smart cities, smart security, smart retail and other fields.The paired cross-attention module is a plug-and-play method applied in the Transformer training process, which can help the model to further improve the accuracy without changing the model reasoning, which helps to quickly update the model.
If you wish to learn more about both papers,
The full text can be downloaded by clicking on the title.
Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification
MSP430G2231IPW14R CVPR (International Conference on Computer Vision and Pattern Recognition) is an annual academic conference held by IEEE. It is also known as the three world's top conferences in the field of computer vision together with ICCV and ECCV.CVPR covers a wide range of topics related to computer vision and pattern recognition, including object recognition, image segmentation, motion prediction, 3D reconstruction, and deep learning.The acceptance criteria for CVPR papers are quite strict, and the jury judges the papers from the aspects of innovation, experimental effect, and expression.
This year, the number of papers submitted to the CVPR Organizing Committee reached a record 8,161, and a total of 2,067 papers were finally accepted, with an acceptance rate of 25.33%.