SHREC 2020 – Extended Monocular Image Based 3D Object Retrieval


Objective

According to our SHREC 2019 - Monocular Image Based 3D Object Retrieval track[1], we have extended the number of the categories from the initial 21 classes to 40 classes, resulting in a new dataset which has 40,000 images and 12,732 models. We seek for more evaluations on the performance of existing and new 3D retrieval methods using the SHREC'20 benchmark, which is more comprehensive. Especially, we would like to bring unsupervised learning to 3D retrieval based on 2D images for its significance in reality.


Introduction

Monocular image-based 3D object retrieval is a novel and challenging research topic in the field of 3D object retrieval. Given a RGB image captured in real world, it aims to search for relevant 3D objects from a dataset. In recent years, the problem of cross-modal retrieval, of which the fundamental challenge lies in the heterogeneity of different modalities of data, has gradually attracted researchers' attention. Work on bridging the gap between different domains has been continuously proposed. However, there is still less work on 3D object retrieval based on 2D images, due to the discrepancy in both domains and modalities.

To facilitate more innovative and interesting developments in this research, we previously constructed a benchmark of "monocular image-based 3D object retrieval", at SHREC'19. The dataset has 21 classes for both 2D images and 3D objects. The 2D images have 1000 samples per category, and the total number of 3D objects is 7690. This track attracted 9 groups from 4 countries and the submission of 20 runs. While the supervised learning gets the superior retrieval performance (best NN is 97.4 %), there still remains a challenge that performance of the unsupervised learning is left behind (best NN is 61.2%).

In fact, 3D models do not own as many annotations as 2D monocular images. At the same time, it is significant to transfer the contents of the 2D monocular images to 3D models without labels. Considering of the issues above, together with not enough categories compared to the real life scenarios, we have extended the original dataset to obtain the benchmark SHREC'20. On the basis of retaining the original 21 classes, 19 classes have been added for both 2D images and 3D objects, where the number of 2D images has been increased to 40,000 and the number of 3D models has been increased to 12,732. For both the initial SHREC'19 benchmark and the extended SHREC'20 benchmark, the unsupervised learning task is required as well as supervised learning. Specifically, it is welcome to have the following proposals:

1) new methods applying on the initial SHREC'19 benchmark which may bring exciting progress with supervised and unsupervised learning.
2) exiting and new approaches to evaluate the performance of the extended SHREC'20 benchmark with supervised and unsupervised learning.

The participants should submit the results of at least one of the tasks shown below:
1) 21-supervised: supervised methods with 21 categories;
2) 21-unsupervised: unsupervised methods with 21 categories;
3) 40-supervised: supervised methods with 40 categories;
4) 40-unsupervised: unsupervised methods with 40 categories;

Different tasks are restricted to different datasets provided, which are further illustrated in the following section.



Dataset

The dataset has 40 classes for both 2D images and 3D objects. The 2D images of the “Monocular Image Based 3D Object Retrieval” dataset (MI3DOR) were selected from ImageNet [2]. The total number of samples is 40000 (.JPEG) and each class has 1000 samples. We randomly selected 500 images per class for training and used the remaining samples for test. The 3D objects in this dataset were selected from the popular 3D dataset NTU[3], PSB[4], ModelNet40[5], ShapeNet[6] and the total number is 12,732. We also randomly selected 50% samples per class as the training set and used the remaining data for test. As some classes are lack of samples, the train set contains 6361 models (.OBJ), and the test set contains 6371 objects (.OBJ). We follow [7] to rendering the OBJ models and got 12 views for each 3D object. The resolution of the view images is 224x224 pixels. The Table 1 shows the information of this dataset.

Benchmark Image Model View
Train 20000 6361 6361*12=76332
Test 20000 6371 6371*12=76452
Total 40000 12732 12732*12=152784

Table 1. Training and testing subset of the datasets.

As for the Unsupervised task, the training data of 3D models will be given without labels. Detailed information is shown in Table 2. "w" in the table means data with labels while "w/o" means data without labels.

Task Source Target
Train Test Train Test
21-supervised w w/o w w/o
21-unsupervised w w/o w/o w/o
40-supervised w w/o w w/o
40-unsupervised w w/o w/o w/o

Table 2. Different label settings for different tasks.

This dataset contains 40 classes, except the origin 21 classes of SHREC'19, 19 more classes are added including lamp, pillow, bowl, desk, ship, stool, train, toilet, sofa, door, tower, telephone, printer, remote_control, helmet, microwave, bag, bench, cap, respectively. Fig.1 shows the data distribution and one example per class is shown in Fig.2 and Fig.3.

Fig 1. Data distribution of this dataset.

Fig 2. 2D image samples in the dataset.

Fig 3. 3D model samples in the dataset


Evaluation Method

For quantitative comparison, we adopt seven popular criteria to evaluate the retrieval algorithm, including Precision-Recall curve(PR), Nearest Neighbor(NN), First Tier(FT), Second Tier(ST), F-measure(F), Discounted Cumulated Gain(DCG) and Average normalized modified retrieval rank(ANMRR).


Procedure


Important Date

  • February 21 - Call for participation, distribution of the database.
  • March 05 - Registration deadline (Extended to April 15).
  • March 21 - Submission of the results and one-page description of their method(s)(Extended to April 21).
  • March 23 - Distribution of relevance judgments and evaluation scores(Extended to April 23).
  • April 02 - Submit the track report for review(Extended to May 02).
  • May 01 - Reviews done, first stage decision on acceptance or rejection.
  • May 22 - First revision.
  • May 29 - Second stage decision on acceptance or rejection.
  • June 12 - Second revision.
  • June 19 - Final decision on acceptance or rejection.
  • September 1 - Publication online in Computers & Graphics.
  • September 4-5 - Eurographics Workshop on 3D Object Retrieval 2020, featuring SHREC 2020.

  • Organizers

    Institute of Television and Image Information State Key Lab of CAD&CG JD AI Research
    An-An Liu,Wei-Zhi Nie,Wen-Hui Li,Dan Song,Yu-Qian Li,He-Yu Zhou,Ting Zhang,Xiao-Qian Zhao Wei Chen Wu Liu

    References

    [1] W. Li, A. Liu, W. Nie, D. Song, and Others, “Shrec 2019-monocular image based 3d model retrieval,” in 12th Eurographics Workshop on 3D Object Retrieval, 3DOR 2019. Eurographics Association, 2019, pp. 1–8.
    [2] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Fei-Fei Li: ImageNet: A large-scale hierarchical image database. CVPR 2009: 248-255.
    [3] Ding-Yun Chen, Xiao-Pei Tian, Yu-Te Shen, Ming Ouhyoung. On Visual Similarity Based 3D Model Retrieval. Comput. Graph. 2003: 223-232.
    [4] Philip Shilane, Patrick Min, Michael M. Kazhdan, Thomas A. Funkhouser. The Princeton Shape Benchmark. SMI 2004: 388.
    [5] Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao, 3D ShapeNets: A deep representation for volumetric shapes. CVPR 2015: 1912-1920.
    [6] Manolis Savva, Fisher Yu, Hao Su, Asako Kanezaki, Takahiko Furuya,Ryutarou Ohbuchi, Zhichao Zhou, et al. Large-Scale 3D Shape Retrieval from ShapeNet Core55. 3DOR, 2017.
    [7] Hang Su, Subhransu Maji, Evangelos Kalogerakis, Erik G. Learned-Miller: Multi-view Convolutional Neural Networks for 3D Shape Recognition. ICCV 2015: 945-953.



    Citation

    If you use any of the results or data on this dataset, please cite the following papers:

    @article{Liutcyb2018,
      author    = {An{-}An Liu and
                   Weizhi Nie and
                   Yue Gao and
                   Yuting Su},
      title     = {View-Based 3-D Model Retrieval: {A} Benchmark},
      journal   = {{IEEE} Trans. Cybernetics},
      volume    = {48},
      number    = {3},
      pages     = {916--928},
      year      = {2018}
    }
    
    @article{Liutip2016,
      author    = {Anan Liu and
                   Weizhi Nie and
                   Yue Gao and
                   Yuting Su},
      title     = {Multi-Modal Clique-Graph Matching for View-Based 3D Model Retrieval},
      journal   = {{IEEE} Trans. Image Processing},
      volume    = {25},
      number    = {5},
      pages     = {2103--2116},
      year      = {2016}
    
    }
    
    @inproceedings{Liuijcai2018,
      author    = {Anan Liu and
                   Shu Xiang and
                   Wenhui Li and
                   Weizhi Nie and
                   Yuting Su},
      title     = {Cross-Domain 3D Model Retrieval via Visual Domain Adaption},
      booktitle = {Proceedings of the Twenty-Seventh International Joint Conference on
                   Artificial Intelligence, {IJCAI} 2018, July 13-19, 2018, Stockholm,
                   Sweden.},
      pages     = {828--834},
      year      = {2018}
    }
    
    @inproceedings{zhou2019dual,
      author       = {Zhou, Heyu and 
                      Liu, An-An and 
    				  Nie, Weizhi},
      title        = {Dual-level Embedding Alignment Network for 2D Image-Based 3D Object Retrieval},
      booktitle    = {Proceedings of the 27th ACM International Conference on Multimedia},
      pages        = {1667--1675},
      year         = {2019},
      organization={ACM}
    }
    

    Acknowledgement

    National Natural Science Foundation of China (61772359,61872267,61902277);
    Tianjin Research Program of Application Foundation and Advanced Technology.



    Last update: 28/01/2019