SARS-CoV-1 3CLpro Task

Task

The task is to learn a model which predicts the probability that a given input molecule is active against the SARS-CoV-1 3CLpro target.

 

As the SARS-CoV-1 3CLpro target is highly homologous to the corresponding protease in SARS-CoV-2 (COVID-19), we hope a model which can identify active molecules against SARS-CoV-1 3CLpro may be useful in identifying active molecules against SARS-CoV-2 3CLpro as well. 

Data

The training data is derived from this assay for SARS-CoV-1 3CLpro activity: https://pubchem.ncbi.nlm.nih.gov/bioassay/1706. There are ~400 positives and ~300K negatives. Evaluation is conducted on a combination of two data sources, with training set overlaps removed: (1) a verified reference list of actives obtained from the creator of the original assay, which are labeled as active, and (2) the Broad repurposing library (https://clue.io/repurposing), which are labeled as inactive. The resulting evaluation set has 41 positives and ~6K negatives.

 

Training data: https://github.com/yangkevin2/coronavirus_data/blob/master/data/AID1706_binarized_sars.csv

Evaluation data: https://github.com/yangkevin2/coronavirus_data/blob/master/data/evaluation_set_v2.csv

Model

Our baseline method is the same as for the antibiotics task: the graph convolutional network Chemprop[1] augmented by 2D RDKit computed features[3]. We additionally run a version of Chemprop which randomly samples from the negatives during each training epoch to preserve class balance. Each method is run as a 5-model ensemble. The metric is ROC-AUC on the evaluation set.

Results

The Chemprop model achieved a test AUC of 0.961.

The Chemprop model with class balance achieved a test AUC of 0.978.

References

[1] Chemprop (https://github.com/chemprop/chemprop) - GitHub repo containing code for the message passing neural network.

 

[2] Yang, Kevin, et al. “Analyzing Learned Molecular Representations for Property Prediction.” Journal of Chemical Information and Modeling. 59.8 (2019): 3370-3388. (https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00237) - Paper describing the message passing neural network applied to a range of molecular properties.

 

[3] Landrum, Greg. "RDKit: Open-source cheminformatics." (2006): 2012. https://www.rdkit.org/ Open source package for computational chemistry.

Stay up to date. 

  • Grey Twitter Icon
  • Grey LinkedIn Icon
  • Grey Twitter Icon
  • Grey LinkedIn Icon