Uncertainty estimation


The goal is to augment regression models to estimate their own uncertainty or expected error, enabling us to identify individual predictions of which we are highly confident.

By focusing on the most confident predictions of ML models, we can cut the time and expense of empirical screening.


For all the datasets, a training pair is represented by a molecular structure (SMILES string) and an activity measurement.  In addition to the property prediction, the model has to output an uncertainty prediction.


Uncertainty estimation methods are tested on a public dataset which stores the lipophilicity of 4201 molecules. This can be accessed in compressed form in the Confidence branch of ChemProp’s GitHub repository.


We distinguish between four primary strategies for uncertainty estimation.

  1. Ensemble-based methods (https://arxiv.org/abs/1612.01474)

  2. Distance-based methods (https://www.ncbi.nlm.nih.gov/pubmed/30404432, https://pubs.rsc.org/en/content/articlelanding/2019/sc/c9sc02298h#!divAbstract)

  3. Mean variance estimation (https://ieeexplore.ieee.org/document/374138)

  4. Union-based methods (https://www.ijcai.org/Proceedings/15/Papers/503.pdf)


We test methods for their ability to rank predictions by error, mimic the distribution of prediction errors, and mimic the magnitude of prediction errors.


For each uncertainty estimation method, RMSE on the lipophilicity dataset is recorded for the 100%, 50%, 25%, 10%, and 5% of the test set on which there was lowest reported uncertainty. Strong uncertainty estimators would see a steep decline in RMSE as predictions become more selective.


[1] Chemprop (https://github.com/chemprop/chemprop) - GitHub repo containing code for the message passing neural network.


[2] Yang, Kevin, et al. “Analyzing Learned Molecular Representations for Property Prediction.” Journal of Chemical Information and Modeling. 59.8 (2019): 3370-3388. (https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00237) - Paper describing the message passing neural network applied to a range of molecular properties.

[3] Scalia, Grambow, et al. “Evaluating Scalable Uncertainty Estimation Methods for DNN-Based Molecular Property Prediction.” (2019). (https://arxiv.org/abs/1910.03127) - Paper which outlines standard practices for evaluating uncertainty estimation methods.

[4] Tran, Neiswanger, et al. “Methods for comparing uncertainty quantifications for material property predictions.” (2019). (https://arxiv.org/abs/1912.10066) - Paper which evaluations the performance of many similar uncertainty estimation methods when applied to material property prediction.