Hi all, in mlg.ulb.ac.be we would like to share some results about the Pseudomonas aeruginosa dataset and have your feedback.
Beyond accuracy, we decided to focus on feature selection. In order to create predictive features from SMILES, we used a notion of distance from the SMILES of known molecules.
We implemented this analysis: we took 70 SMILES related to known molecules having an effect on COVID (e.g. Hydroxychloroquine) and 70 random ones. It happened that most of the time (and with a significant hypergeometric pvalue) the most relevant features (selected by mRMR) are the ones related to COVID (as we suspected) with the following ranking:
Moxifloxacin Darunavir Fluphenazine Stavudine Interferon Beta Tradipitant Cobicistat Macrolide dexlansoprazole Tesaglitazar ritonavir Chloroquine phosphate Ceftriaxone Anakinra
Do you think this is a useful analysis direction? If interested, MLG can share the list of SMILES we used. Do you have some molecule you would like to test in this sense?