Skip to main content
ConsultationsConsultations Archive

62/474/NP Machine Learning-enabled Medical Device – Performance Evaluation Process

By September 8, 2023No Comments

Comment period start date:


Comment period end date:



This document defines a standardized performance evaluation process for Machine Learning-enabled Medical Devices (MLMD). The set of processes, activities, and tasks described in this document establishes a common framework for MLMD performance evaluation.


The evaluation of ML performance is a crucial factor in measuring the overall MLMD’s performance. The performance is one of the critical quality characteristics that should be measured in the MLMD lifecycle, from initial modeling throughout actual product operation until decommissioning of the MLMD.

The MLMD product lifecycle should include ML-specific activities like data management, ML modeling, model learning, verification, model tuning and final test, deployment and updating. MLMD also needs to be evaluated in the total product lifecycle to validate whether relevant performance characteristics meet the quality requirements.

l MLMDs has characteristics not covered by current evaluation processes:

One of the most significant benefits of MLMD is the potential for further learning and performance improvement as more data becomes available, including synthetic data or real-world data processed during the MLMD’s operation.

More generally, the development of MLMDs involves the use of ML algorithms, models, and data. The ML models themselves are often regarded as a “black box”, whose output may be clinically significant. Current software lifecycle processes do not provide the evaluation methods on the various characteristics of MLMDs (including the ML model as a “black box”) in a standardized way

l This standard should close existing gaps when it comes to demonstrating the safety and effectiveness of MLMDs:

This document specifies the evaluation process that can assist the manufacturers of MLMD to evaluate the overall ML-related performance which includes technical and clinical performance. It could be used for monitoring the performance to assure the intended performance in the total MLMD lifecycle.

Three pillars of MLMD performance are proposed for this purpose and further explained in the standard:

  • Scientific validity
  • Technical (analytical) performance
  • Clinical performance

To achieve this purpose, this document builds on established terms and concepts from IMDRF and medical device standards, while taking into account relevant AI-specific standards. One of the foundations for this document is IMDRF N41, Software as a Medical Device (SaMD): Clinical Evaluation. While IMDRF N41 has SaMD in its scope, whereas MLMD can also be SiMD, the concepts described in IMDRF N41 can be used for the purpose of this document and therefore beyond SaMD

Please find the webpage here.

Ben Kemp