Hi all, in mlg.ulb.ac.be we would like to share some results about the Pseudomonas aeruginosa dataset and have your feedback.
Beyond accuracy, we decided to focus on feature selection. In order to create predictive features from SMILES, we used a notion of distance from the SMILES of known molecules.
We implemented this analysis: we took 70 SMILES related to known molecules having an effect on COVID (e.g. Hydroxychloroquine) and 70 random ones. It happened that most of the time (and with a significant hypergeometric pvalue) the most relevant features (selected by mRMR) are the ones related to COVID (as we suspected) with the following ranking:
Moxifloxacin Darunavir Fluphenazine Stavudine Interferon Beta Tradipitant Cobicistat Macrolide dexlansoprazole Tesaglitazar ritonavir Chloroquine phosphate Ceftriaxone Anakinra
Do you think this is a useful analysis direction? If interested, MLG can share the list of SMILES we used. Do you have some molecule you would like to test in this sense?
Hi Gianluca, thank you for the post! These sound like interesting results. I was wondering, could you provide a bit more detail so we can better understand your methods? How exactly are the features being selected and what are they being used to predict? Is the goal to differentiate between the 70 COVID-related SMILES and the 70 random SMILES? Also just to clarify, are the features that you listed some of the 70 molecules known to have effect against COVID, or are they something different? Thanks in advance!
Hi Kyle, we posted code, uptodate results in https://github.com/gbonte/mitchemio. Take a look and contact me directly if you have questions
Hi Gianluca, thank you for posting the code! I'll take a look through and let you know if I have any questions.