The complete evaluation metrics and scheme is available for registered users on this page.
Overall, performance of the methods will be assessed on three levels:
- Lesion detection: in short, how many individual lesions in the ground truth were detected by the evaluated method, independently of the precision of their contours.
- Lesion segmentation: in short, how well are the lesions in the ground truth overlapping with thos of the evaluated method. This will account for the accuracy of the contours. These criterions will include scores as Dice measure, precision, recall and all related scores.
- Computation performance and modalities used: how much time the method took to produce its result, multi-threaded algorithm, etc. In addition, the challengers are encouraged to use (if possible) the smallest number of modalities as possible. Although this will not be directly included in the performance metrics, it will be seen as a bonus if some algorithms are robust to the absence of some modalities.
All these criterions will be looked at knowing that seven manual expert annotations will be provided for evaluation. Consensus based evaluation will thus be used, by computing an inter-expert ground truth from the individual annotations (using LOP STAPLE). The generated ground truth will be used to compute the metrics but also to see the relative performance of the methods with respect to those of the experts.