top of page

Responsible algorithmic decision making

Identifying overfitting of ML models to genetic background

Test_CIP_AZM.png

Machine learning algorithms assume samples in their training data are "independent", but in bacteria, no two samples are truly independent, particularly in large collections of bacteria of the same species. 

​

This lack of independence can have serious consequences, causing the model to learn genetic background features that correspond to clones possessing your phenotype of interest, rather than the genetic mechanisms causing the phenotype. As a result, it can behave unpredictably when it encounters bacteria from a different genetic background (more likely in collections from new countries or populations). 

​

​

​

Wheeler_ECCMIDArtboard 1_2x.png

To learn more, see this poster from ECCMID 2019. 

​

bottom of page