A more complete description of evaluation metrics is provided in the following pdf: MS_Challenge_Evaluation_Challengers.
Overall, performance of the methods will be assessed on two levels:
- New lesion detection: in short, how many individual new lesions in the ground truth were detected by the evaluated method, independently of the precision of their contours.
- New lesion segmentation: in short, how well are the lesions in the ground truth overlapping with those of the evaluated method. This will account for the accuracy of the contours. These criterions include scores like Dice measure.
One important thing to note is that we have only purely new lesions in the ground truth. Thus, we do not ask you (and in fact ask you not to) to find growing, shrinking or disappearing lesions. The subject of interest really is the purely new lesions. The consensus has been built using four expert segmentations and a validation by a “super-expert” for lesions with discrepancies. More description is available on the data page.
The evaluation will be carried out, similarly to the 2016 challenge, using the segmentation performance analyzer tool available in Anima (animaSegPerfAnalyzer) whose code is available open-source from the downloads page of Anima (use the source code version for latest updates). Some patients do not have lesions. For these lesions, specific metrics will be computed detailed in the document above. These metrics will be automatically computed by animaSegPerfAnalyzer when the ground truth is empty.
The tools for validation may be used by the challengers during the training phase. Their documentation is available from here: https://anima.readthedocs.io/en/latest/segmentation.html#segmentation-performance-analyzer