The goal is to identify new antibiotics by building a model that can predict whether a given molecule will inhibit the growth of the bacterium E. coli.
Although the task of predicting E. coli inhibition isn’t directly related to COVID-19, methods that can successfully predict the efficacy of molecules against bacteria may be similarly successful when applied to predicting efficacy against viruses like SARS-CoV-2, the virus that causes COVID-19.
The model is trained on a dataset of 2335 input/output pairs where the input is a SMILES string, i.e. a string representation of a molecule like C1=CC=CC=C1 for benzene, and the output is a 1 or a 0 indicating whether the molecule inhibits the growth of E. coli or not.
The model used to predict E. coli inhibition is a type of neural network called a message passing neural network (MPNN). MPNNs are designed to operate on graph-structured objects like molecules, where each atom is represented by a node and each bond is represented by an edge.
An MPNN for molecules works by first creating feature vectors for each atom and bond based on simple properties like atom type (carbon, oxygen, etc) and bond type (single, double, etc). Then it performs a series of “message passing” steps where a neural network sends information between neighboring atoms and bonds, thereby encoding local chemical information. After a number of these steps, the local chemical information is aggregated to form a single vector representing the entire molecule, which is then processed by a feed-forward neural network that makes the final property prediction.
The models achieved a test AUC of 0.896 ± 0.002 across six trials.
 Yang, Kevin, et al. “Analyzing Learned Molecular Representations for Property Prediction.” Journal of Chemical Information and Modeling. 59.8 (2019): 3370-3388. (https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00237) - Paper describing the message passing neural network applied to a range of molecular properties.