This site uses cookies to offer you a better browsing experience. By continuing, you are agreeing to the use of cookies on your device as described in our privacy policy.
Beware of fake email, SMS and WhatsApp messages: check before clicking. Read more
WG-P1 How poor data hygiene leads towards false AI models in biomedicine
How poor data hygiene leads towards false AI models in biomedicine
Description
A mistrained AI will not only not save the world, it also has the potential to do a lot of harm. In the biomedical setting, this may lead towards wrong diagnosis, prognoses or even wrong
treatment, with potentially disastrous consequences. In this study, you will examine 3 malpractices rampant in current biomedical AI practice. Using data garnered from bespoke publications,
you will demonstrate the biases generated associated with following malpractices: (1) False normalization (aka test set bias) --- where class imbalance effects are irrevocably locked
in and cannot be eradicated. (2) Data leakage --- where a single sample sectioned into multiple samples end up in both training and test sets. And (3) Feature substitutability, where
the AI will always do well, regardless of how it is trained or what it was trained with. You will demonstrate that once these issues are resolved, the high reported prediction performance
will disappear. We will use this insights from this project to inform the community on how to develop better and more rigorous models.