The pure technical scoring of predictive power as a measure of the accuracy of a prediction (using ROC/AUC, Receiver Operating Characteristic / Area Under the Curve) may conceal design problems most notably in the AI training data: garbage in / hallucinations out….
This relates to the extent to which these models are learning on data that takes account of the wider social context in which ill-health sits. It is to be determined whether AI models are trained to embody any particular social reality and not just the extent to which they deal with identification of individuals within defined clinical parameters which may include forms of bias (caused by the trainers and by the selection of the data itself) and discrimination (from the way it works).
The concern here is that the use of machine learning is not just an issue of assessing models for their technical accuracy, but of understanding the extent to which the use of these models is compatible with our notions of ‘social justice’ and the fact that humans are individuals and one cannot simply generalise as machine learning training does.