The complete evaluation metrics and scheme will soon be available.
Overall, performance of the methods will be assessed on two levels:
- New lesion detection: in short, how many individual new lesions in the ground truth were detected by the evaluated method, independently of the precision of their contours.
- New lesion segmentation: in short, how well are the lesions in the ground truth overlapping with those of the evaluated method. This will account for the accuracy of the contours. These criterions include scores like Dice measure.
One important thing to note is that we have only purely new lesions in the ground truth. Thus, we do not ask you (and in fact ask you not to) to find growing, shrinking or disappearing lesions. The subject of interest really is the purely new lesions. The consensus has been built using four expert segmentations and a validation by a “super-expert” for lesions with discrepancies. More description is available on the data page.
The evaluation will be carried out, similarly to the 2016 challenge, using the segmentation performance analyzer tool available in Anima (animaSegPerfAnalyzer) whose code and binary versions are available open-source from the downloads page of Anima. This tool may be used by the challengers during the training phase. Its documentation is available from here: https://anima.readthedocs.io/en/latest/segmentation.html#segmentation-performance-analyzer