Rules and policies

    General
  1. The MICCAI Learn2Learn Challenge will investigate cross-domain few-shot learning in different medical imaging domains.
  2. The goal of our challenge is to investigate optimal universal learning algorithms, rather than solving a very specific task.
  3. The challenge along with the leaderboard is hosted on https://www.l2l-challenge.org/
  4. Participation
  5. The challenge is open to everyone, except for the organisers and members of the organisers' groups.
  6. Organisers and members of organisers' groups may participate but will not be eligible for awards. Employees from other labs or departments of organisers' institutions may participate and will be eligible for awards.
  7. Participating teams may consist of one or more individuals. Each indiviual may only be part of one team. The creation of multiple accounts to circumvent this rule is not permitted.
  8. Each team member must register individually on the challenge platform with their full name and affiliation. One team member will create a team and invite the other team members to join. This process must be completed before the registration deadline.
  9. All valid results will be announced publicly through the leaderboard. The teams of the three top-performing methods will be invited to present their work in person at MICCAI 2023.
  10. We aim to publish an analysis of the challenge results. The three top-performing teams will be invited to contribute to this publication. Moreover, the challenge organisers may invite additional teams with particularly novel/interesting algorithms to contribute.
  11. Participating teams are free to publish their own algorithms and results separately but may only reference the overall challenge results once the challenge paper is published.
  12. Data
  13. The participants may use the data we provide, i.e. our meta-dataset for training. Additionally, a set of publicly available and commonly used computer vision datasets, may be used, specifically:
    • ImageNet (ILSVRC 2012), miniImageNet, tieredImageNet
    • CIFAR 100, CIFAR-FS
    • MSCOCO
    • Omniglot
  14. The use of openly available pre-trained neural networks trained exclusively on the above datasets is also permitted. Any pre-trained networks used, as well as the data used for training must be reported in the post-submission report.
  15. The target of each task varies depending on the individual task. Our meta-dataset contains tasks related to various diseases, anomalies, patient features, and image quality features, with different types of classification targets including binary classification, multiclass classification, multilabel classification, and ordinal regression.
  16. We define a few-shot task as a specific classification task derived from one of our datasets. We define a task instance as a specific instantiation of this task with N specific labelled examples per class. A few-shot task instance contains a support set of labelled images, a query set of unlabelled images, for which labels should be predicted, and a task target, e.g. multi-class classification. One test case corresponds to one task instance, i.e. training on a specific support set of N samples and evaluating on the corresponding query set.
  17. The test data will contain previously unseen (not contained in the meta-training data) data scarce scenarios that bear resemblance to the meta-training data, but is distinct from them. An example, could be a novel detection task for a rare anomaly (e.g. spina bifida in fetal ultrasound screening) for which it is difficult to gather large amounts of training data.
  18. Submission and evaluation
  19. The algorithm assessment is fully automatic. The preliminary evaluation engine and final evaluation will be performed on the challenge servers.
  20. Participating teams are expected to submit their algorithms as a Docker or Singularity container. The container should be able to accept a few-shot task instance, containing a support set of labelled images and a query set of unlabelled images, and output predictions for the query set.
  21. Few-shot task instances will be passed to the container as torchcross.data.Task objects in pickled files.
  22. The resulting predictions for the query set should be written in a pickled torch tensor saved in the given output directory.
  23. The organisers will provide readers for the input and writers for the output in the form of Python classes. The participating teams are free to use these readers and writers or to implement their own.
  24. The container should be able to process a task instance within 2 minutes on a single A100 GPU. If a task instance times out, the accuracy for this instance will be set to 0 for the ranking.
  25. The evaluation software will be made publicly available.
  26. Submissions can be tested for viability by running the submitted algorithm on the challenge servers using a private validation set.
  27. Submissions are allowed up to three times per week, with an exception made on the final day of the challenge, where teams may submit up to three times regardless of any previous submissions.
  28. Participating teams are allowed to select up to three submissions as final, of which the best-performing one will be used for the team's ranking.
  29. For the final testing stage, we will adjust the execution time limit with a generous margin of error.
  30. Each testing instance consists of a small support set of N labelled images (N in {3,5,7,10} for the final ranking) and a query set of unlabelled images. Both, the query and the support set are from a previously unseen dataset. The algorithms can use the labelled support samples to adapt a classifier to the task which is then used to predict labels for the unlabelled samples.
  31. Metrics and ranking
  32. The challenge will be evaluated using mean accuracy over several few-shot task instances. The sampled few-shot tasks will include different numbers of shots, i.e. examples per class. Since we believe that a range of 3 to 10 shots is the most relevant for our application, we include this range in the ranking.
  33. For each submission, the accuracies of all sampled few-shot instances of each test task are averaged. These average accuracies are used to generate rankings for each task. These rankings are aggregated to generate a final aggregate challenge ranking.
  34. Reproducibility
  35. Participating teams are expected to make their methods fully reproducible. This includes the availability of code, any additional data used, as well as instructions on how to replicate the results.
  36. After the submission deadline for results, participating teams need to submit a short report of one to three pages describing their algorithm. The report should contain a brief description of the methods used in the three final submissions, including the model, the data used for training and hyperparameter settings. Teams are strongly encouraged to also include a link to their code.